MTMvis built on the topic model obtained from the abstracts of the citing entities, shown against the three period P1-P3. For each period the visualization plots the topics distribution (e.g. topic 3 is the dominant topic in all the periods: P1, P2 and P3

MTMvis built on the topic model obtained from the abstracts of the citing entities, shown against the three period P1-P3. For each period the visualization plots the topics distribution (e.g. topic 3 is the dominant topic in all the periods: P1, P2 and P3

Source publication
Article
Full-text available
In this article, we show the results of a quantitative and qualitative analysis of open citations on a popular and highly cited retracted paper: “Ileal-lymphoid-nodular hyperplasia, non-specific colitis and pervasive developmental disorder in children” by Wakefield et al . , published in 1998. The main purpose of our study is to understand the beha...

Contexts in source publication

Context 1
... MTMvis visualizations are plotted considering the period P1-P3 (Fig. 6) and the subject areas of the citing articles (Fig. 7). As shown in Fig. 6, the topics 1, 2 and 5 were constantly increasing their percentages over the time while, on the contrary, topics 4 and 9 were decreasing. Along the same lines, topics 3 and 11 showed a very similar pattern along the three periods. As shown in Fig. 7, some ...
Context 2
... MTMvis visualizations are plotted considering the period P1-P3 (Fig. 6) and the subject areas of the citing articles (Fig. 7). As shown in Fig. 6, the topics 1, 2 and 5 were constantly increasing their percentages over the time while, on the contrary, topics 4 and 9 were decreasing. Along the same lines, topics 3 and 11 showed a very similar pattern along the three periods. As shown in Fig. 7, some subject areas, such as medicine and social sciences, referred to almost all the ...
Context 3
... Fig. 16, we investigated the sections of the in-text citations marked as credits and cites as evidence. On the one hand, the credits citations were mostly distributed on descriptive sections -i.e. introduction, discussion and background -during all the three periods. Fig. 15 The four graphs illustrate the way the use of citation intents ...
Context 4
... Fig. 15 The four graphs illustrate the way the use of citation intents changed over time (i.e., the three periods P1, P2 and P3) and according to their perceived sentiment. The citation intents cites as evidence, critiques and credits are illustrated in separated charts, that show an increment in the negative sentiment along the three periods Fig. 16 The cites as evidence and credits citation intents distributions among the sections (the recognizable ones) and during the three periods (i.e. P1-P3) Fig. 17 The evolution over time of three groups of topics defined from the citation contexts of the in-text citations to WF-PUB-1998 On the other hand, the cites as evidence citations ...

Citations

... The percentage of citations that mention the retraction has grown over time. It was 33% in 2015 and 61% in 2017, which was the last studied year (Heibi & Peroni, 2021;cf. Suelzer et al., 2019). ...
... To efficiently identify the themes, we provided keywords to explore the possibilities within each cluster. To discover topics of the scholarly documents in each cluster, we employed the Latent Dirichlet Allocation (LDA) [25,26] approach for topic modeling. LDA represents each scholarly document as a distribution of topics and each topic as a distribution of keywords. ...
Article
Full-text available
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.
... One such example is the continued citation of Andrew Wakefield's article, whose purported link between autism and vaccination has been debunked [24], but may still be influential amongst anti-vaccination activists and believers. While the analyses of post-retraction citation of Wakefield's article indicated that many of these are made in a negative sense and there is a prevalence in indication of its retraction [25][26][27], there are still concerns as there are citations that did not document the retraction [25], and that because, as pointed out by Leta and colleagues, "recent citing articles are highly cited and, even in a negative context, they contribute to the diffusion of a fraudulent article in the science context" [27]. Another potentially worrying example of post-retraction citation pertains to those associated with retracted COVID-19 papers [28,29], which are largely made without reference to their retractions and in a non-critical manner. ...
Article
Full-text available
Once retracted, the citation count of a research paper might be intuitively expected to drop precipitously. Here, we assessed the post-retraction citation of life and medical sciences papers from two top-ranked, multidisciplinary journals Nature and Science, from 2010 to 2018. Post-retraction citations accounted for a staggering 47.7% and 40.9% of total citations (median values), respectively, of the papers included in our analysis. These numbers are comparable with those from two journals with lower impact factors, and with retracted papers from the physical sciences discipline. A more qualitative assessment of five papers from the two journals with a high percentage (>50%) of post-retraction citations, all of which are associated with misconduct, reveal different contributing reasons and factors. Retracted papers associated with highly publicized misconduct cases are more prone to being cited with the retraction status indicated, or projected negatively (such as in the context of research ethics and misconduct discussions), with the latter also indicated by cross-disciplinary citations by humanities and social sciences articles. Retracted papers that retained significant validity in their main findings/conclusions may receive a large number of neutral citations that are somewhat blind to the retraction. Retracted papers in popular subject areas with massive publication outputs, particularly secondary publications such as reviews, may also have a high background citation noise. Our findings add further insights to the nature of post-retraction citations beyond the plain notion that these are largely made through sheer ignorance or negligence by the citing authors.
... Research on the citation behavior of retracted articles focused on quantitative aspects, such as citation growth and alt-metrics [20]. In 2021, Heibi and Peroni performed a citation analysis of Wakefield's retracted work [21], claiming an (in reality non-existent) association between vaccinations and autism [22]. They found that Wakefield's citations continued to increase after retraction, but most citations are for general discussions. ...
Chapter
Full-text available
The amount of information in digital libraries (DLs) has been experiencing rapid growth. With the intense competition for research breakthroughs, researchers often intentionally or unintentionally fail to adhere to scientific standards, leading to the retraction of scientific articles. When a paper gets retracted, all its citing articles have to be verified to ensure the overall correctness of the information in digital libraries. Since this subjective verification is extremely time and resource-consuming, we propose a triage process that focuses on papers that imply a dependence on retracted articles, thus requiring further reevaluation. This paper seeks to establish a systematic approach for identifying and scrutinizing scholarly works that draw upon retracted work by direct citations, thus emphasizing the importance of further evaluation within the scholarly discourse. Firstly, we categorized and identified the intention in the citation context using verbs with predicative complements and cue phrases. Secondly, we classified the citation intentions of the retracted articles into dependent (if the citing paper is based on or incorporates part of the cited retracted work) and non-dependent (if the citing article discusses, criticizes, or negates the cited work). Finally, we compared the existing state-of-the-art literature and found that our proposed triage process can aid in ensuring the integrity of scientific literature, thereby enhancing its quality.
... In contrast, for non-CS papers, problematic data, results, and duplicate and plagiarised papers dominated. Heibi and Peroni [25] have also noted how computer science seemed to be distinct from other disciplines. However, the single, largest difference is not apparent from the chart in that for approximately 56% of CS papers, but only 26% of all other papers, little or no information is publicly available as to why the paper has been retracted. ...
Article
Full-text available
Context: The retraction of research papers, for whatever reason, is a growing phenomenon. However, although retracted paper information is publicly available via publishers, it is somewhat distributed and inconsistent. Objective: The aim is to assess: (i) the extent and nature of retracted research in Computer Science (CS) (ii) the post-retraction citation behaviour of retracted works and (iii) the potential impact upon systematic reviews and mapping studies. Method: We analyse the Retraction Watch database and take citation information from the Web of Science and Google scholar. Results: We find that of the 33,955 entries in the Retraction watch database (16 May 2022), 2,816 are classified as CS, i.e., ≈ 8%. For CS, 56% of retracted papers provide little or no information as to the reasons. This contrasts with 26% for other disciplines. There is also some disparity between different publishers, a tendency for multiple versions of a retracted paper to be available beyond the Version of Record (VoR), and for new citations long after a paper is officially retracted (median = 3; maximum = 18). Systematic reviews are also impacted with ≈ 30% of the retracted papers having one or more citations from a review. Conclusions: Unfortunately, retraction seems to be a sufficiently common outcome for a scientific paper that we as a research community need to take it more seriously, e.g., standardising procedures and taxonomies across publishers and the provision of appropriate research tools. Finally, we recommend particular caution when undertaking secondary analyses and meta-analyses which are at risk of becoming contaminated by these problem primary studies.
... 42 In 2021, Heibi and Peroni studied the citation patterns in a twenty-year window (1998-2018) of a highly cited paper that was published in 1998 in Te Lancet, partially retracted in 2004, and fully retracted in 2010, fnding that the paper had accumulated 130, 148, and 337 citations by these three dates, respectively, many of which ofered positive support to the paper or even failed to indicate that it was retracted. 43 Ultimately, for journals that employ those citations for metrics-based 'rankings, ' such as the Clarivate Analytics journal impact factor (JIF) or Elsevier's CiteScore, if such citations are abused, for example, JIF or CiteScore built on retracted papers (i.e., potentially invalid citations), then such journals and publishers might beneft unfairly. 44 In such cases, to be fair to other journals whose JIF, CiteScore or other metrics might not have been dependent on the citation of retracted literature, those metrics need to be adjusted downward as a corrective, not punitive, measure. ...
Article
Full-text available
Citations in a scientific paper reference other studies and form the information backbone of that paper. If cited literature is valid and non-retracted, an analysis of citations can offer unique perspectives about the supportive or contradictory nature of a statement. Yet, such analyses are still limited by the relative lack of access to open citation data. The creation of open citation databases (OCDs) allows for data analysts, bibliometric specialists and other academics interested in such topics to independently verify the validity and accuracy of a citation. Since the strength of an individual's curriculum vitae can be based on, and assessed by, metrics (citation counts, altmetric mentions, journal ranks, etc.), there is interest in appreciating citation networks and their link to research performance. Open citations would thus not only benefit career, funding and employment initiatives, they could also be used to reveal citation rings, abusive author-author or journal-journal citation strategies, or to detect false or erroneous citations. OCDs should be open to the public, and publishers have a moral responsibility of releasing citation data for free use and academic exploration. Some challenges remain, including long-term funding, and data and information security.
... In the second step, we created vectors for each of the generated tokens using a Bag of Words (BoW) model 7 , which we considered appropriate to model our study considering our direct experience in previous findings (Heibi & Peroni, 2021a) and the suggestions by Bengfort et al. (2018) on the same issue. Finally, to build the LDA topic model, we determined in advance the number of topics to retrieve according to the examined corpus using a popular method based on the value of the topic coherence score, as suggested in (Schmiedel et al., 2019), which can be used to measure the degree of the semantic similarity between highscoring words in the topic. ...
... The opposite trend was observed in other disciplines, according to prior studies, such as biomedicine (Dinh et al., 2019) and psychology . However, prior studies such as (Heibi & Peroni, 2021a) and (Schneider et al., 2020), also observed that in the health sciences domain were cases where either a single or a few popular cases of retraction were characterized by an increment of citations after the retraction. This might suggest that the discipline related to the retracted publication is not the only central factor to consider for predicting the citation trend after the retraction. ...
... ), an ontology for the characterization of factual and rhetorical bibliographic citations.We used the decision model developed and adopted in(Heibi & Peroni, 2021a) to decide which citation function select to label an in-text citation.Figure 4shows part of the decision model, it presents the case when the intent of the citation is "Reviewing and eventually giving an opinion on the cited entity" and the citation function is part of one of the following groups:"Consistent with", "Inconsistent with", or "Talking about".Downloaded from http://direct.mit.edu/qss/article-pdf/doi/10.1162/qss_a_00222/2057107/qss_a_00222.pdf by guest on 10 November 2022 Heibi, I., and Peroni, S. (2022) A quantitative and qualitative open citation analysis of retracted articles in the humanities. Quantitative Science Studies. ...
Article
Full-text available
In this article, we show and discuss the results of a quantitative and qualitative analysis of open citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, etc.) and the characteristics of their in-text citations (e.g., intent, sentiment, etc.). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities’ abstracts and the in-text citation contexts. As part of our main findings, we noticed that there was no drop in the overall number of citations after the year of retraction, with few entities which have either mentioned the retraction or expressed a negative sentiment toward the cited publication. In addition, on several occasions, we noticed a higher concern/awareness when it was about citing a retracted publication, by the citing entities belonging to the health sciences domain, if compared to the humanities and the social science domains. Philosophy, arts, and history are the humanities areas that showed the higher concerns toward the retraction. Peer Review https://publons.com/publon/10.1162/qss_a_00222
... The study was fully retracted in 2010 due to lack of ethical approval, flawed study design, and "incorrect" elements [7], but has continued to receive citations in recent years. The majority of these citing papers disagree with the Wakefield paper and/or acknowledge its retraction [8,9], perhaps in part due to the amount of attention its retraction received. However, despite its retraction, the paper may have had a negative impact on public opinion of the MMR vaccine [10], as well as vaccination rates, e.g. in Britain [11]. ...
Article
Full-text available
Background Retraction is a mechanism for alerting readers to unreliable material and other problems in the published scientific and scholarly record. Retracted publications generally remain visible and searchable, but the intention of retraction is to mark them as “removed” from the citable record of scholarship. However, in practice, some retracted articles continue to be treated by researchers and the public as valid content as they are often unaware of the retraction. Research over the past decade has identified a number of factors contributing to the unintentional spread of retracted research. The goal of the Reducing the Inadvertent Spread of Retracted Science: Shaping a Research and Implementation Agenda (RISRS) project was to develop an actionable agenda for reducing the inadvertent spread of retracted science. This included identifying how retraction status could be more thoroughly disseminated, and determining what actions are feasible and relevant for particular stakeholders who play a role in the distribution of knowledge. Methods These recommendations were developed as part of a year-long process that included a scoping review of empirical literature and successive rounds of stakeholder consultation, culminating in a three-part online workshop that brought together a diverse body of 65 stakeholders in October–November 2020 to engage in collaborative problem solving and dialogue. Stakeholders held roles such as publishers, editors, researchers, librarians, standards developers, funding program officers, and technologists and worked for institutions such as universities, governmental agencies, funding organizations, publishing houses, libraries, standards organizations, and technology providers. Workshop discussions were seeded by materials derived from stakeholder interviews (N = 47) and short original discussion pieces contributed by stakeholders. The online workshop resulted in a set of recommendations to address the complexities of retracted research throughout the scholarly communications ecosystem. Results The RISRS recommendations are: (1) Develop a systematic cross-industry approach to ensure the public availability of consistent, standardized, interoperable, and timely information about retractions; (2) Recommend a taxonomy of retraction categories/classifications and corresponding retraction metadata that can be adopted by all stakeholders; (3) Develop best practices for coordinating the retraction process to enable timely, fair, unbiased outcomes; and (4) Educate stakeholders about pre- and post-publication stewardship, including retraction and correction of the scholarly record. Conclusions Our stakeholder engagement study led to 4 recommendations to address inadvertent citation of retracted research, and formation of a working group to develop the Communication of Retractions, Removals, and Expressions of Concern (CORREC) Recommended Practice. Further work will be needed to determine how well retractions are currently documented, how retraction of code and datasets impacts related publications, and to identify if retraction metadata (fails to) propagate. Outcomes of all this work should lead to ensuring retracted papers are never cited without awareness of the retraction, and that, in public fora outside of science, retracted papers are not treated as valid scientific outputs.
... In contrast, for non-CS papers, problematic data, results, and duplicate and plagiarised papers dominated. Heibi and Peroni [20] have also noted how computer science seemed to be distinct from other disciplines. ...
Preprint
Full-text available
Context: The retraction of research papers, for whatever reason, is a growing phenomenon. However, although retracted paper information is publicly available via publishers, it is somewhat distributed and inconsistent. Objective: The aim is to assess: (i) the extent and nature of retracted research in Computer Science (CS) (ii) the post-retraction citation behaviour of retracted works and (iii) the potential impact on systematic reviews and mapping studies. Method: We analyse the Retraction Watch database and take citation information from the Web of Science and Google scholar. Results: We find that of the 33,955 entries in the Retraction watch database (16 May 2022), 2,816 are classified as CS, i.e., approximately 8.3%. For CS, 56% of retracted papers, provide little or no information as to the reasons. This contrasts with 26% for other disciplines. There is also a remarkable disparity between different publishers, a tendency for multiple versions of a retracted paper over and above the Version of Record (VoR), and for new citations long after a paper is officially retracted. Conclusions: Unfortunately retraction seems to be a sufficiently common outcome for a scientific paper that we as a research community need to take it more seriously, e.g., standardising procedures and taxonomies across publishers and the provision of appropriate research tools. Finally, we recommend particular caution when undertaking secondary analyses and meta-analyses which are at risk of becoming contaminated by these problem primary studies.
... (Peroni & Shotton, 2012), an ontology for the characterization of factual and rhetorical bibliographic citations. We used the decision model developed and summarized in Figure 4 and already adopted in (Heibi & Peroni, 2021a) to decide which citation function select to label an in-text citation. We do not introduce the full details of the labelling process due to space constraints; an extensive introduction and explanation can be found in (Heibi & Peroni, 2021b). ...
... In the second step, we created vectors for each of the generated tokens using a Bag of Words (BoW) model (Brownlee, 2019), which we considered appropriate to model our study considering our direct experience in previous findings (Heibi & Peroni, 2021a) and the suggestions by Bengfort et al. (2018) on the same issue. Finally, to build the LDA topic model, we determined in advance the number of topics to retrieve according to the examined corpus using a popular method based on the value of the topic coherence score, as suggested in (Schmiedel et al., 2019), which can be used to measure the degree of the semantic similarity between high-scoring words in the topic. ...
... psychology . However, prior studies such as (Heibi & Peroni, 2021a) and (Schneider et al., 2020), also observed that in the health sciences domain were cases where either a single or a few popular cases of retraction were characterized by an increment of citations after the retraction. This might suggest that the discipline related to the retracted article is not the only central factor to consider for predicting the citation trend after the retraction, and that other factors might play a crucial role, such as the popularity and the media attention to the retraction case, as it has been discussed in the studies by Mott et al. (2019) and Bar-Ilan and Halevi (2017). ...
Preprint
Full-text available
In this article, we show and discuss the results of a quantitative and qualitative analysis of citations to retracted publications in the humanities domain. Our study was conducted by selecting retracted papers in the humanities domain and marking their main characteristics (e.g., retraction reason). Then, we gathered the citing entities and annotated their basic metadata (e.g., title, venue, subject, etc.) and the characteristics of their in-text citations (e.g., intent, sentiment, etc.). Using these data, we performed a quantitative and qualitative study of retractions in the humanities, presenting descriptive statistics and a topic modeling analysis of the citing entities' abstracts and the in-text citation contexts. As part of our main findings, we noticed a continuous increment in the overall number of citations after the retraction year, with few entities which have either mentioned the retraction or expressed a negative sentiment toward the cited entities. In addition, on several occasions we noticed a higher concern and awareness when it was about citing a retracted article, by the citing entities belonging to the health sciences domain, if compared to the humanities and the social sciences domains. Philosophy, arts, and history are the humanities areas that showed the higher concerns toward the retraction.