Article

Some Results on the Function and Quality of Citation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Other studies of disagreement have been performed in the context of classification schemes of citation function. In an early attempt to categorize types of citations, disagreement was captured as "juxtapositional" and "negational" citations (Moravcsik and Murugesan, 1975). However, this scheme was manually developed using a limited sample of papers and citations, and so the robustness and validity of the categories cannot be easily assessed. ...
... For example, the ability to differentiate between paper-level and community-level disagreement could lend insight into how conflict and controversy manifest in different fields. This definition could also be developed to differentiate further between types of disagreement: for example, past citation classification schemes have differentiated between "juxtaposition" and "negational" citations (Moravcsik and Murugesan, 1975), or between "weakness" and "contrast" citations (Teufel et al., 2006). ...
... The portability of our queries also mean that they can readily be applied to other full-text data. The general method of generating and manually validating signal and filter terms can also be applied to other scientific phenomena, such as detecting uncertainty , negativity (Catalini et al., 2015), discovery (Small et al., 2017), or an expanded framework of disagreement (Moravcsik and Murugesan, 1975). ...
Article
Full-text available
Disagreement is essential to scientific progress but the extent of disagreement in science, its evolution over time, and the fields in which it happens remain poorly understood. Here we report the development of an approach based on cue phrases that can identify instances of disagreement in scientific articles. These instances are sentences in an article that cite other articles. Applying this approach to a collection of more than four million English-language articles published between 2000 and 2015 period, we determine the level of disagreement in five broad fields within the scientific literature (biomedical and health sciences; life and earth sciences; mathematics and computer science; physical sciences and engineering; and social sciences and humanities) and 817 meso-level fields. Overall, the level of disagreement is highest in the social sciences and humanities, and lowest in mathematics and computer science. However, there is considerable heterogeneity across the meso-level fields, revealing the importance of local disciplinary cultures and the epistemic characteristics of disagreement. Analysis at the level of individual articles reveals notable episodes of disagreement in science, and illustrates how methodological artifacts can confound analyses of scientific texts.
... The existing methods using citation impact indicators like h-index and Journal Impact Factors (JIFs), which are based on citation frequency, have been used alongside the earlier peer-reviewing approaches for research evaluation (Aksnes et al., 2019). Traditional use of citation counts alone as an indicator for measuring the scientific impact of research publications, researchers, as well as research institutions has been widely criticised in the past (Kaplan, 1965;Moravcsik and Murugesan, 1975). The San Francisco Declaration on Research Assessment (DORA) 1 released in 2013 includes 18 recommendations for improving research evaluation methods to mitigate the limitations of the citation count based impact assessment methods. ...
... The apprehension concerning the appropriateness and the reliability of methodologies involving mere citation counting in the context of research evaluation constitutes a key application area 1 https://sfdora.org/read/ Moravcsik and Murugesan (1975) found that out of 575 bibliographic references from 30 articles, 40% of citations were perfunctory and 33% of them were redundant, raising concerns about using citation counts as a quality measure. Research in this direction is often motivated by the observation that readers are not just interested in how many times a work is cited, but also why it is being cited (Lauscher et al., 2017). ...
... This connection between the research publications is accomplished through the use of citations, which act as a bridge between the citing and the cited document. The reason or motivation for citing a paper has been studied extensively by Sociologists of Science and Information Scientists in the past (Cano, 1989;Moravcsik and Murugesan, 1975;Nigel Gilbert, 1977;Oppenheim and Renn, 1978). Garfield et al. (1965) in his pioneering work identifies 15 reasons for citing a paper, of which few of them are "Paying homage to pioneers, Giving credit for related work, Identifying method, equipment etc., Providing background reading" and so forth. ...
Article
Full-text available
The aim of this literature review is to examine the current state-of-the-art in the area of citation classification. In particular, we investigate the approaches for characterising citations based on their semantic type. We conduct this literature review as a metaanalysis covering 60 scholarly articles in this domain. Although we included some of the manual pioneering works in this review, more emphasis is given on the later automated methods, which use Machine Learning and Natural Language Processing (NLP) for analysing the fine-grained linguistic features in the surrounding text of citations. The sections are organised based on the steps involved in the pipeline for citation classification. Specifically, we explore the existing classification schemes, datasets, pre-processing methods, the extraction of contextual and non-contextual features and the different types of classifiers and evaluation approaches. The review highlights the importance of identifying the citation types for research evaluation, the challenges faced by the researchers in the process and the existing research gaps in this field. Peer Review https://publons.com/publon/10.1162/qss_a_00159
... Studies on functions and types of citations have found that many citations can be classified as perfunctory. In the earliest study of this kind, Moravcsik & Murugesan (1975) constructed a classification scheme for references along four dimensions. One dimension was the contrast between organic and perfunctory citations. ...
... Their study showed, among other things, that 41 % of the references were nonessential (perfunctory). Chubin & Moitra (1975) further developed the citation context classification of Moravcsik & Murugesan (1975) and applied it to a sample 43 theoretical and experimental high-energy physics papers from four journals published in 1968-69. Their system also includes a 'perfunctory' citation category for 'papers referred to as related to the reported research without additional comment.' ...
Conference Paper
What accounts for the observed better quality of publication-level topical science clustering solutions which use only citation relations as input data, compared to those using sophisticated semantic similarity data derived from both citations and textual terms? A survey of empirical work relevant to the concept of unconscientious referencing practices indicates that purely citation-based methods should be affected by significant ‘citation noise’, unlike text-based methods. This study continues work with the Astro benchmarking data set for bibliometric clustering by applying semantic representation learning techniques to scientific documents in order to isolate the clustering performance difference between direct citations and textual terms. We investigate variants of Random Indexing embeddings learned on this data set and one pre-trained off-the-shelf semantic document embedding, SPECTER. The evaluation is performed with four previously introduced validation data sets but using a newly suggested clustering evaluation measure.
... The first issue is the gradient in citations across publications in a network, requiring a cut-off criteria to circumscribe the most referred publication. The second issue is that the citation index alone does not necessarily reflect theoretical links between citing and cited publication [12,15,17]. Therefore, it is necessary to differentiate citations that reveal theoretical links from other types of citations. ...
... In reading the excerpts of text, the volunteers could find citations that did not establish a conceptual link between relevant publication and the citing study. The citations read might be just referring to a tool or method (in which case it was named 'operational'), it might not be used to structure an argument ('perfunctory'), or it might be used as an example of something wrong ('negational') (sensu Moravcsik & Murugesan [17]). The volunteers in our study were asked to disregard citations that were considered by them as operational, perfunctory or negational. ...
Article
Full-text available
It has been proposed that ecological theory develops in a pragmatic way. This implies that ecologists are free to decide what, from the knowledge available to them, they will use to build models and learn about phenomena. Because in fields that develop pragmatically knowledge generation is based on the decisions of individuals and not on a set of predefined axioms, the best way to produce theoretical synthesis in such fields is to assess what individuals are using to support scientific studies. Here, we present an approach for producing theoretical syntheses based on the propositions most frequently used to learn about a defined phenomenon. The approach consists of (i) defining a phenomenon of interest; (ii) defining a collective of scientists studying the phenomenon; (iii) surveying the scientific studies about the phenomenon published by this collective; (iv) identifying the most referred publications used in these studies; (v) identifying how the studies use the most referred publications to give support to their studies and learn about the phenomena; (vi) and from this, identifying general propositions on how the phenomenon is approached, viewed and described by the collective. We implemented the approach in a case study on the phenomenon of ecological succession, defining the collective as the scientists currently studying succession. We identified three propositions that synthesize the views of the defined collective about succession. The theoretical synthesis revealed that there is no clear division between “classical’’ and “contemporary’’ succession models, and that neutral models are being used to explain successional patterns alongside models based on niche assumptions. By implementing the pragmatic approach in a case study, we show that it can be successfully used to produce syntheses based on the actual activity of the scientific community studying the phenomenon. The connection between the resulting synthesis and research activity can be traced back through the methodological steps of the approach. This result can be used to understand how knowledge is being used in a field of study and can guide better informed decisions for future studies.
... The study of citation function classification can be traced as early as 1965 when Garfield [6] proposed fifteen categories of citation function, and a similar scheme was also introduced by Weinatoek [29] in 1971. Moravcsik and Murugesa [18] proposed a scheme encompassing four orthogonality facets of citation function: conceptual or operational, evolutionary or juxtapositional, organic or perfunctory, and confirmative or negational in 1975. Those works only provide limited analysis because the manually annotated data scale is small. ...
... Using both shallow and linguistically-inspired features, they presented a supervised machine learning framework to classify citation function automatically and acquired good performance. Inspired by Moravcsik and Murugesa's annotation scheme [18], Jochim and Schütze [13] introduced a new feature designed with lexical features and linguistic features and integrated them into the Stanford MaxEnt classifier [17] to improve the classification accuracy. Abu-Jbara et al. [1] proposed an annotation scheme of six categories, mixed the function and sentiment of the citation. ...
Conference Paper
Full-text available
Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches.
... Although various extraneous factors contaminate citations, such as self-citations and contrasts across scientific specialties, these contaminants can be ameliorated with sufficient methodological precautions, making the total citations received by a scientist's lifetime publications a reasonable indicator of total impact (J. Cole & Cole, 1971;Lindsey, 1989;Moravcsik & Murugesan, 1975). Although several procedures exist for compiling a scientist's citations, current research lends some support to the h-index (Ruscio, Seaman, D'Oriano, Stremlo, & Mahalchik, 2012), which equals the number of publications cited at least h times (Hirsch, 2005). ...
Chapter
Science, Technology, and Society - edited by Todd L. Pittinsky November 2019
... Prompts are also used to interact with the conversational agent ChatGPT 1 . Study of the purpose of citations has been done since as early as the 1960s and 70s [9]- [11]. Various automated approaches have been explored; examples include rule-based algorithm [12], nearest neighbour-based classification [13], support vector machine [14], multinomial naïve Bayes [14], and ensemble techniques [15]. ...
Preprint
Full-text available
Citations in scientific papers not only help us trace the intellectual lineage but also are a useful indicator of the scientific significance of the work. Citation intents prove beneficial as they specify the role of the citation in a given context. In this paper, we present CitePrompt, a framework which uses the hitherto unexplored approach of prompt-based learning for citation intent classification. We argue that with the proper choice of the pretrained language model, the prompt template, and the prompt verbalizer, we can not only get results that are better than or comparable to those obtained with the state-of-the-art methods but also do it with much less exterior information about the scientific document. We report state-of-the-art results on the ACL-ARC dataset, and also show significant improvement on the SciCite dataset over all baseline models except one. As suitably large labelled datasets for citation intent classification can be quite hard to find, in a first, we propose the conversion of this task to the few-shot and zero-shot settings. For the ACL-ARC dataset, we report a 53.86% F1 score for the zero-shot setting, which improves to 63.61% and 66.99% for the 5-shot and 10-shot settings, respectively.
... Then, to make this notion operational, we will now turn to the study of citational practices. Indeed, citation is an institutionalized practice in scientific research: citing is a performative action which labels the cited papers depending on the function the citing author attribute to them (Moravcsik & Murugesan, 1975). The reasons for citing can be very varied: persuasiveness, positive or negative credit, information, reader alert, etc. (Brooks, 1986). ...
Article
Full-text available
This article advocates the benefits of a sociological perspective for the philosophy of mathematical practice. Drawing from the literature of the sociology of sciences, it defends a community-centered approach of the study of mathematical practice and assesses the role of the notion of metamathematics in mathematical change and in stabilized mathematical practices. It relies on the case study of the emergence of geometric control theory at the beginning of the 1970s and of the citational practices associated to the community of control theory since the mid-1990s. The case study shows that the introduction of geometric tools in control theory at the end of the 1960s induced a change in the metamathematical views that control theorists had on their objects. It is then demonstrated how membership to the community of control theory shapes the production and the reception of the theorems of Stefan, Sussmann and Nagano. Interpreting the historical development and citational practices of this community through the perspective of metamathematics, this paper concludes by discussing the role of the orbit theorem in control theory, both as a cognitive label and as a social marker of membership to this community.
... Citation reasons scrutiny process encouraged the researchers towards critical assessment of mere citation count analysis systems wherein a pure count of citations is determined as worth of a study. Moravcsik and Murugesan (1975) presented the pioneer study focusing on manual citation classification into four categories. Okumura Nanba (1999) manually classified citations into three classes to extract the worthy citation. ...
Article
Full-text available
Citation analysis-based systems are premised on assuming that all citations are equally important. The scientific community argues that a citation may hold divergent reasons and, thus, should not be treated at par. In this regard, a plethora of existing studies classify citations for varying reasons. Presently, the community has a propensity toward binary citation classification with the notion of contemplating only important reasons while employing quantitative analysis-based measures. We argue that outcomes yielded by the contemporary state-of-the-art models cannot be deemed ideal as the plethora of them has been evaluated on a data set with a minimal number of instances, due to which the outcomes cannot be generalized. The scope of results from such approaches is restricted to a single domain only, which may exhibit entirely different behaviour for the different data sets. Most of the studies are ruled by the content-based features evaluated by harnessing traditional classification models like Support Vector Machine (SVM) and random forest (RF), while an inconsiderable number of studies employ metadata which holds the potential to serve as a quintessential indicator, to tackle meaningful citations. In this study, we introduce a Multilayer perceptron artificial neural network (MLP-ANN) binary citation classifier, which exploits the best combinations of features formed using both sources. We also introduce a new benchmark data set from the electrical engineering domain, which is consolidated with two existing benchmark data sets for model evaluation. The outcomes reveal that the results produced by the proposed MLP model outperform the contemporary models achieving a precision of 0.92.
... Small subsequently distils five exclusive categories of references in which the cited work is refuted, noted only, reviewed, applied, or supported. This somewhat resembles another influential categorisation of citation motives, as established by Moravcsik and Murugesan (1975). They dichotomously distinguish between perfunctory vs. organic references; conceptual vs. operational references; evolutionary vs. juxtapositional references; and confirmative vs. negational references. ...
Article
Full-text available
Citing practices have long been at the heart of scientific reporting, playing both socially and epistemically important functions in science. While such practices have been relatively stable over time, recent attempts to develop automated citation recommendation tools have the potential to drastically impact citing practices. We claim that, even though such tools may come with tempting advantages, their development and implementation should be conducted with caution. Describing the role of citations in science’s current publishing and social reward structures, we argue that automated citation tools encourage questionable citing practices. More specifically, we describe how such tools may lead to an increase in: perfunctory citation and sloppy argumentation; affirmation biases; and Matthew effects. In addition, a lack of transparency of the tools’ underlying algorithmic structure renders their usage problematic. Hence, we urge that the consequences of citation recommendation tools should at least be understood and assessed before any attempts to implementation or broad distribution are undertaken.
... Partly because canonical papers tend to inspire follow-up research which builds on them, citation relationships have been widely used to quantify scientific impact (Bergstrom, West, & Wiseman, 2008;Cole & Cole, 1974;Garfield, 2006;Hirsch, 2005;King, 2004;Sinatra, Wang, Deville, Song, & Barabá si, 2016;Uzzi, Mukherjee, Stringer, & Jones, 2013;Waltman, 2016;Wang et al., 2013;Way, Morgan, Larremore, & Clauset, 2019;Wu, Wang, & Evans, 2019;Wuchty, Jones, & Uzzi, 2007). At the same time, citations can be affected by myriad factors: publication venue, year, field of study, among many other reasons why authors cite a given paper (Aksnes, 2006;Moravcsik & Murugesan, 1975;Radicchi, 2012;Simkin & Roychowdhury, 2002), contributing to noise in evaluating and comparing their relative importance. ...
Preprint
Full-text available
Newton's centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across disciplines, about 95% of papers stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper's future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. And papers that did not have a giant but later became a giant tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful new dimension in assessing scientific impact that goes beyond sheer citation counts.
... Until now, various studies have contented regarding equal importance of citations. In 1975, Moravcsik and Murugesan presented the first manual technique by classifying the citations into four categories [28]. ...
Article
Full-text available
The scientific community has presented various citation classification models to refute the concept of pure quantitative citation analysis systems wherein all citations are treated equally. However, a small number of benchmark datasets exist, which makes the asymmetric citation data-driven modeling quite complex. These models classify citations for varying reasons, mostly harnessing metadata and content-based features derived from research papers. Presently, researchers are more inclined toward binary citation classification with the belief that exploiting the datasets of incomplete nature in the best possible way is adequate to address the issue. We argue that contemporary ML citation classification models overlook essential aspects while selecting the appropriate features that hinder elutriating the asymmetric citation data. This study presents a novel binary citation classification model exploiting a list of potential natural language processing (NLP) based features. Machine learning classifiers, including SVM, KLR, and RF, are harnessed to classify citations into important and non-important classes. The evaluation is performed using two benchmark data sets containing a corpus of around 953 paper-citation pairs annotated by the citing authors and domain experts. The study outcomes exhibit that the proposed model outperformed the contemporary approaches by attaining a precision of 0.88.
... Instead of treating all citations equally (independent from their nature and positioning in the text), citation practices analysis is concerned with the reason for a citation (Stremersch et al., 2015;Tregua, Brozovic and D'Auria, 2021). For example, some types of citations indicate a more fundamental influence on the citing article, while other citations reflect indirect mentions or may even be perfunctory (Moravcsik and Murugesan, 1975). We argue, it is important to understand these differences to evaluate the trajectory of influential articles that shape a research stream. ...
Article
Full-text available
Purpose The purpose of this study is to diagnose the trajectory of influential conceptual articles in developing a research stream. The authors uncover the knowledge diffusion through influential conceptual articles and identify characteristics that make conceptual articles influential in their field. Design/methodology/approach This study draws on scientometrics, specifically an integrated approach combining quantitative citation counts with qualitative citation practices analysis that offers a comprehensive understanding of the nature and context of citations. The authors use the case of customer engagement – a prominent contemporary marketing and service research stream – to explore the trajectory of influential articles in shaping a new research stream. Findings This research shows that influential articles contribute to the reciprocal knowledge diffusion within and outside their home discipline. They provide anchor points for conceptual framing, conceptual refining and conceptual reconciliation – three application patterns of citations that are pivotal to navigate theory discovery and theory justification in a research field. Research limitations/implications The study analyzes the early impact period of two influential customer engagement articles to understand the developments leading to the establishment of a new research stream. Future research drawing on automated citation and bibliometric methods may consider extended time periods. Originality/value This study traces the trajectory of influential articles in marketing and service research. The authors identify characteristics of influential conceptual articles, and recommend practices to develop a conceptual paper with the potential for an influential trajectory. It shows that while marketing and service research has a tradition of “borrowing” theories from other fields, seminal articles “lend” theories to other fields.
Article
Full-text available
Finding relevant research papers is a challenging task due to the enormous number of scientific publications published each year. In recent years, the scientific community has delved into the analysis of citations at a deep level, specifically examining the content of the papers, in order to identify more important documents. Citations serve as potential parameters for determining linkages between research articles. They have been extensively used for various academic purposes, such as calculating journal impact factors, determining researchers’ h-index, allocating research grants, and identifying the latest research trends. However, it has been argued by researchers that not all citations are equally influential. As a result, alternative techniques have been proposed to identify important citations based on content, metadata, and bibliographic information. Nevertheless, the current state-of-the-art approaches still require further improvement. Additionally, the use of deep learning models and word embedding techniques in this context has not been extensively studied. In this research work, we propose an approach based on two primary modules: 1) Section-wise citation count, and 2) metadata-based analysis of citation intent. Our study also involves conducting several experiments using deep learning models combined with FastText, word2vec, and BERT-based word embeddings to perform citation analysis. These experiments were conducted on two benchmark datasets, and the results were compared with a contemporary study that employed a rich set of content-based features for classification. Our findings indicate that the deep learning CNN model coupled with FastText word embedding achieves the best results in terms of accuracy, precision, and recall. It outperforms the existing state-of-the-art model with a precision score of 0. 97.
Article
The COVID-19 pandemic provides a unique opportunity to study science communication and, in particular, the transmission of consensus. In this study, we show how “science communicators,” writ large to include both mainstream science journalists and practiced conspiracy theorists, transform scientific evidence into two dueling consensuses using the effectiveness of masks as a case study. We do this by compiling one of the largest, hand-coded citation datasets of cross-medium science communication, derived from 5 million Twitter posts of people discussing masks. We find that science communicators selectively uplift certain published works while denigrating others to create bodies of evidence that support and oppose masks, respectively. Anti-mask communicators in particular often use selective and deceptive quotation of scientific work and criticize opposing science more than pro-mask communicators. Our findings have implications for scientists, science communicators, and scientific publishers, whose systems of sharing (and correcting) knowledge are highly vulnerable to what we term adversarial science communication.
Article
Full-text available
Background Evidence syntheses cite retracted publications. However, citation is not necessarily endorsement, as authors may be criticizing or refuting its findings. We investigated the sentiment of these citations—whether they were critical or supportive—and associations with the methodological quality of the evidence synthesis, reason for the retraction, and time between publication and retraction. Methods Using a sample of 286 evidence syntheses containing 324 citations to retracted publications in the field of pharmacy, we used AMSTAR-2 to assess methodological quality. We used scite.ai and a human screener to determine citation sentiment. We conducted a Pearson’s chi-square test to assess associations between citation sentiment, methodological quality, and reason for retraction, and one-way ANOVAs to investigate association between time, methodological quality, and citation sentiment. Results Almost 70% of the evidence syntheses in our sample were of critically low quality. We found that these critically low-quality evidence syntheses were more associated with positive statements while high-quality evidence syntheses were more associated with negative citation of retracted publications. In our sample of 324 citations, 20.4% of citations to retracted publications noted that the publication had been retracted. Conclusion The association between high-quality evidence syntheses and recognition of a publication’s retracted status may indicate that best practices are sufficient. However, the volume of critically low-quality evidence syntheses ultimately perpetuates the citation of retracted publications with no indication of their retracted status. Strengthening journal requirements around the quality of evidence syntheses may lessen the inappropriate citation of retracted publications.
Article
Much effort has been made in the past decades to citation function classification, but noteworthy issues exist. Annotation difficulty resulted in limited data size, especially for minority classes, and inadequate representativeness of the underlying scientific domains. Concerning algorithmic classification, state-of-the-art deep learning-based methods are flawed by generating a feature vector for the whole citation context (or sentence) and failing to exploit the full realm of citation modelling options. Responding to these issues, this paper studied contextualised citation function classification. Specifically, a large new citation context dataset was created by merging and re-annotating six datasets about computational linguistics. A variety of strong SciBERT-based citation function classification models were proposed, and new states of the art were achieved. Through deeper performance analysis, this study focused on answering several research questions about the effective ways of performing citation function classification. More specifically, the study justified the necessity of modelling in-text citations in context and confirmed the superiority of doing citation function classification at citation (segment) level. A particular emphasis was placed on in-depth per-class performance analysis to understand whether citation function classification is robust enough to suit various popular downstream applications and what further efforts are required to meet such analytic needs. Finally, a naïve ensemble classifier was proposed, which greatly improved citation function classification performance.
Article
Purely quantitative citation measures are widely used to evaluate research grants, to compare the output of researcher or to benchmark universities. The intuition that not all citations are the same, however, can be illustrated by two examples. First, studies have shown that erroneous or controversial papers have higher citation counts. Second, does a high-level citation in an introduction have the same impact as a reference to a paper that serves as a conceptual starting point? Companions to purely quantitative measures are the so-called citation context analyses which aim to obtain a better understanding of the link between citing and cited work. In this article, we propose a classification scheme for citation context analysis in the field of modelling in engineering. The categories were defined based on an extensive literature review and input from experts in the field of modelling. We propose a detailed scheme with six categories ( Perfunctory, Background Information, Comparing/Confirming, Critique/Refutation, Inspiring, Using/Expanding) and a simplified scheme with three categories ( High-level, Critical Analysis, Extending) that can be used within automatic classification approaches. The results of manually classifying 129 randomly selected citations show that 87% of citations fall into the high-level category. This study confirms that critical citations are not common in written academic discourse, even though criticism is essential for scientific progress and knowledge construction.
Article
We introduce and analyse a simple probabilistic model of article production and citation behavior that explicitly assumes that there is no decline in citability of a given article over time. It makes predictions about the number and age of items appearing in the reference list of an article. The latter topics have been studied before, but only in the context of data, and to our knowledge no models have been presented. We then perform large-scale analyses of reference list length for a variety of academic disciplines. The results show that our simple model cannot be rejected, and indeed fits the aggregated data on reference lists rather well. Over the last few decades, the relationship between total publications and mean reference list length is linear to a high level of accuracy. Although our model is clearly an oversimplification, it will likely prove useful for further modeling of the scholarly literature. Finally, we connect our work to the large literature on “aging” or “obsolescence” of scholarly publications, and argue that the importance of that area of research is no longer clear, while much of the existing literature is confused and confusing.
Article
With the exponential increase in the number of published articles, recommending them on the basis of the citation context (also called local or citation-aware citation recommendation) has attracted many researchers in the last few years. Recently, some papers have been devoted to reviewing previous works about scientific paper recommendation. As far as can be discerned, none of the previous review papers has carried out an in-depth study to explain citation context and compare previous studies. This paper presents a comparative analysis of recent studies about context-aware citation recommendation. Moreover, four gaps related to citation context extraction, citation context classification, temporal and structural aspects of a citation context, and benchmarking datasets are identified. This comparative study can assist researchers interested in further exploring these four gaps.
Chapter
Citation plays a pivotal role in determining the associations among research articles. It portrays essential information in indicative, supportive, or contrastive studies. The task of inline citation classification aids in extrapolating these relationships; However, existing studies are still immature and demand further scrutiny. Current datasets and methods used for inline citation classification only use citation-marked sentences constraining the model to turn a blind eye to domain knowledge and neighboring contextual sentences. In this paper, we propose a new dataset, named 3Cext, which along with the cited sentences, provides discourse information using the vicinal sentences to analyze the contrasting and entailing relationships as well as domain information. We propose PeriCite, a Transformer-based deep neural network that fuses peripheral sentences and domain knowledge. Our model achieves the state-of-the-art on the 3Cext dataset by \(+0.09\) F1 against the best baseline. We conduct extensive ablations to analyze the efficacy of the proposed dataset and model fusion methods.Keywordscitation classificationbibliometricstransformer
Article
Citation frequency is an important metric for the evaluation of academic papers, but it assumes that all citations are of equal value. The purpose of this study is to determine the validity of citation polarity, which contains evaluative information such as criticism or praise, in the evaluation of paper quality. In this paper, 3538 citation sentences in papers from ACL conferences were selected and manually annotated for citation polarity. They were divided into best paper group and matching paper group, and tested in heterologous pairs to determine whether there were differences in the positive and negative citations of the two groups, and to further investigate the trend of citation polarity with the increase of citation window. The results of the study showed that the best paper and the matching paper had significant differences in the number of positive and negative citations, and the mean and median values of positive citations in the best group were about 1.5 times higher than those in the matching group. As the citation window increased, the best papers maintained both positive and negative citation dominance over 5 years, and the peak citation in the best group was about three times higher than that in the matching group. Therefore, the metric of citation polarity can help evaluate the quality of papers and provide new ideas for scientific and objective evaluation of academic papers.
Article
This study investigated the extent to which seven papers on mechanical weed control have been cited, understood and utilised in subsequent publications. Web of Science was used to identify the citing publications, and citation content analysis was conducted to investigate the cognitive links between the citing and cited publications. Cognition involves acquiring, understanding, and using knowledge. Citation of the seven publications in 305 publications was classified, and it was found that perfunctory citations (those that were routinely referenced, with little effort to understand or use content) accounted for 53% of all citations and 16% of the citing articles included citations that were not supported by the references. The most striking finding was that key content was rarely used in articles, despite being referenced in 42% of the published articles. It is recommended that more time be allocated by authors to understanding literature, as this would appear to be a matter of diminishing concern for the scientific community. For those who assume their research area has a better citation practice than found in this study, it is recommended that authors conduct a citation content analysis within their own research area to increase the focus on good literature practices.
Chapter
This chapter surveys the evolution of the “research university” that initiated competition for professional recognition among scholars. Peer recognition became the basis of an unprecedented pyramidal distribution of internal prestige within the various disciplines, with a small number of extremely visible scientists occupying the top layer of the pyramid: the elite. Initial endeavors to identify the members of this elite were conducted by James McKeen Cattell, professor of psychology at Pennsylvania University and the editor of the biographical directory American Men of Science, who established a “star system” based on the judgment of members of the National Academy of Sciences, and by Robert I. Watson and Edwin G. Boring, who aggregated the peer ratings of psychologists. To identify eminence more comprehensively, researchers increasingly turned to citations as the key indicator of scholarly recognition. Despite some well-known methodological downsides, citation-based research, it is argued, provides the most accurate method for identifying disciplinary elites.
Article
Some science and technology policies promote collaboration between natural sciences (NS) and social sciences and humanities (SSH) on research topics related to complex societal issues such as climate change. However, there is a lack of empirical research on how and why a discipline uses knowledge from the same or another discipline. This study employs citation context analysis to explore the characteristics of citation behavior between NS and SSH. Specifically, focusing on climate change (SDG 13) and renewable energy (SDG 7), we classified related papers as either NS or SSH. Further, we analyzed how citation behavior differs by patterns of citations between disciplines, such as NS citing NS and NS citing SSH. The findings show that the sections where citations are more likely to be made or the citation purposes significantly differ by each pattern in each topic. In addition, it was common across both topics that NS tended to cite SSH frequently in the methodology section. While a typical collaboration pattern between NS and SSH is assumed as NS contributes methodologically to the solution of SSH research questions, the findings suggest that SSH contributes methodologically to NS. This study sheds new light on the exploration of knowledge flows between disciplines.
Article
Full-text available
Caracteriza os artigos retratados mais citados autorados por pesquisadores brasileiros, tipifica as citações pós-retratação e identifica padrões e outliers associados aos documentos citados e citantes analisados. Utiliza o método bibliométrico e a técnica de análise de citações, configurando-se como uma pesquisa exploratória. Dos resultados obtidos a partir da análise de 512 citações distribuídas em 407 documentos citantes, identificou-se que 75,8% consistiam em citações neutras, 23% em citações positivas e 1,2% em menções negativas. A prevalência das citações neutras mostra que estes artigos continuam sendo citados como documentos presentes na literatura, sem o julgamento de sua validade científica, o que também levanta preocupações sobre as práticas de citação na Academia. Abstract It characterizes the most cited retracted articles authored by Brazilian researchers, typifies post-retraction citations, and identifies patterns and outliers associated with the cited and citing documents analyzed. It uses the bibliometric method and the technique of citation analysis, configuring itself as exploratory research. From the results obtained from the analysis of 512 citations distributed in 407 citing documents, it was identified that 75.8% consisted of neutral citations 23.0% of positive citations and 1.2% of negative mentions. The prevalence of neutral citations shows that these articles continue to be cited as documents present in the literature without judging their scientific validity. Keywords: retracted article; citation analysis; post-retraction citation; research integrity; research misconduct
Article
Full-text available
Traditional citation analyses use quantitative methods only, even though there is meaning in the sentences containing citations within the text. This article analyzes three citation meanings: sentiment, role, and function. We compare citation meanings patterns between fields of science and propose an appropriate deep learning model to classify the three meanings automatically at once. The data comes from Indonesian journal articles covering five different areas of science: food, energy, health, computer, and social science. The sentences in the article text were classified manually and used as training data for an automatic classification model. Several classic models were compared with the proposed multi-output convolutional neural network model. The manual classification revealed similar patterns in citation meaning across the science fields: (1) not many authors exhibit polarity when citing, (2) citations are still rarely used, and (3) citations are used mostly for introductions and establishing relations instead of for comparisons with and utilizing previous research. The proposed model’s automatic classification metric achieved a macro F1 score of 0.80 for citation sentiment, 0.84 for citation role, and 0.88 for citation function. The model can classify minority classes well concerning the unbalanced dataset. A machine model that can classify several citation meanings automatically is essential for analyzing big data of journal citations.
Article
Full-text available
A massive research corpus is generated in this epoch based on some previously established concepts or findings. For the acknowledgment of the base knowledge, researchers perform citations. Citations are the key considerations used in finding the different research measures, such as ranking the institutions, researchers, countries, computing the impact factor of journals, allocating research funds, etc. But in calculating these critical measures, citations are treated equally. However, researchers have argued that all citations can never be equally influential. Therefore, researchers have proposed other techniques to identify the important content-based, meta-data-based, and bibliographic-based citations. However, the produced results by the state-of-the-art still need to be improved. In this research work, we proposed an approach based on two primary modules, 1) The section-wise citation count and 2) Sentiment based analysis of citation sentences. The first technique is based on extracting the different sections of the research articles and performing citation count.We applied Neural Network and Multiple Regression on section-wise citations for automatic weight assignment. The citation sentences were extracted in the second approach, and sentiment analysis was used for sentences. Citations were classified with Support Vector Machine, Multilayer Perceptron, and Random Forest. F-measure, Recall, and Precision were considered to evaluate the results, compared with the state-of-the-art results. The value of precision with the proposed approach was enhanced to 0.94.
Article
This research proposes a new approach that considers citation relevance in main path analysis (MPA). Traditional MPA assumes that all citations have equal weight, but in practice treating every citation equally may not find the main paths that truthfully reflect the knowledge flow in a target science field. To address the issue, this study suggests taking the level of relevance among documents into consideration. For demonstration purposes, the level of relevance is determined by similarity in both citation structure and key phrases among documents. The approach not only achieves convergence of development trajectories, but also helps frame the topics on the main paths to a specific concept from a wide range of research domains. This study takes health interoperability fields as the demonstration case to show the effects of converging the trajectories toward a target domain.
Article
Purpose Citations have been used as a common basis to measure the academic accomplishments of scientific books. However, traditional citation analysis ignored content mining and without consideration of citation equivalence, which may lead to the decline of evaluation reliability. Hence, this paper aims to integrate multi-level citation information to conduct multi-dimensional analysis. Design/methodology/approach In this paper, books’ academic impacts were measured by integrating multi-level citation resources, including books’ citation frequencies and citation-related contents. Specifically, firstly, books’ citation frequencies were counted as the frequency-level metric. Secondly, content-level metrics were detected from multi-dimensional citation contents based on finer-grained mining, including topic extraction on the metadata and citation classification on the citation contexts. Finally, differential metric weighting methods were compared with integrate the multi-level metrics and computing books’ academic impacts. Findings The experimental results indicate that the integration of multiple citation resources is necessary, as it can significantly improve the comprehensiveness of the evaluation results. Meanwhile, compared with the type differences of books, disciplinary differences need more attention when evaluating the academic impacts of books. Originality/value Academic impact assessment of books via integrating multi-level citation information can provide more detailed evaluation information and cover shortcomings of methods based on single citation data. Moreover, the method proposed in this paper is publication independent, which can be used to measure other publications besides books.
Article
The way retracted papers have been mentioned in post-retraction citations reflects the perception of the citing authors. The characteristics of post-retraction citations are therefore worth studying to provide insights into the prevention of the citation chain of retracted papers. In this study, full-text analysis is used to compare the distinctions of citation location and citation sentiment—attitudes and dispositions toward the cited work—between the conditions of correctly mentioning the retracted status (called CM) and not mentioning the retracted status (called NM). Statistical test is carried out to explore the effect of CM on post-retraction citations in the field of psychology. It is shown that the citation sentiment of CM is equally distributed as negative, neutral, and positive, while for NM, it is mainly distributed as the latter two. CM papers tend to cite retracted papers in Methodology, whereas NM papers cite more in Theoretical Background and Conclusion. The perception efficiency of retractions in psychology is low, where the average unaware duration (UD, the period between when the retraction note has been published and when the first citation directly pointed out its retracted status) lasts for 2.88 years. Also, UD is negatively correlated with the quantity of CM and the growth rate of NM, the proportionate change of NM before and after the first CM paper appears (P <0.01). After being aware of retractions, the average rate of change (ARC, the total change divided by its taken time) of NM declines significantly (Z=-2.823, P <0.01) whereas CM sees a raise in most disciplines, which contributes to the reduction of possible interdisciplinary impact.
Article
Full-text available
For novice writers as well as EAP practitioners, citation use poses considerable challenges stemming from the writers’ limited understanding of disciplinary conventions, unwarranted source use, or rhetorical strategies. Although studies handling those challenges have reported typical citation functions in various disciplines, the explanation of functions in terms of content still confuses novice writers. Citation content per se is crucial to the specificity and reliability of the evidence presented. Drawing on a form-content-function integration, the present study examined the Introduction and the Results and Discussion sections in sixty research articles from physics, biology, education, and applied linguistics. We found that non-integral citations prevail across the four disciplines. We also found variations of citation contents and functions in the four disciplines and in part-genres. Specifying rhetorical functions and content in part-genres can thus enhance the evaluative power of source use.
Article
We analyzed co-citation patterns in 332.498 articles published in Anglophone psychology journals between 1946 and 1990 to estimate (1) when cognitive psychology first emerged as a clearly delineated subdiscipline, (2) how fast it grew, (3) to what extent it replaced other (e.g., behaviorist) approaches to psychology, (4) to what degree it was more appealing to scholars from a younger generation, and (5) whether it was more interdisciplinary than alternative traditions. We detected a major shift in the structure of co-citation networks between approximately 1955 and 1975 and draw novel conclusions about the developments commonly referred to as ‘the cognitive turn’.
Article
Full-text available
Newton’s centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across disciplines, about 95% of papers stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper’s future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. And papers that did not have a giant but later became a giant tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful new dimension in assessing scientific impact that goes beyond sheer citation counts. Peer Review https://publons.com/publon/10.1162/qss_a_00186
Article
Full-text available
The citation intent extraction and classification has long been studied as it is a good measure of relevancy. Different approaches have classified the citations into different classes; including weak and strong, positive and negative, important and unimportant. Others have gone further from binary classification to multi-classes, including extension, use, background, or comparison. Researchers have utilized various elements of the information, including both meta and contents of the paper. The actual context of any referred article lies within the citation context where a paper is referred. Various attempts have been made to study the citation context to capture the citation intent, but very few have encoded the words to their contextual representations. For automated classification, we need to train deep learning models, which take the citation context as input and provides the reason for citing a paper. Deep neural models work on numeric data, and therefore, we must convert the text information to its numeric representation. Natural languages are much complex than computer languages. Computer languages have a pre-defined fixed syntax where each word has a unique meaning. In contrast, every word in natural language may have a different meaning and may well be understood by understanding the position, previous discussion, and neighboring words. The extra information provides the context of a word within a sentence. We have, therefore, used contextual word representation, which is trained through deep neural networks. Deep models require massive data for generalizing the model, however, the existing state-of-the-art datasets don’t provide much information for the training models to get generalized. Therefore, we have developed our own scholarly dataset, Citation Context Dataset with Intent (C2D-I), an extension of the C2D dataset. We used a transformers based model for capturing the contextual representation of words. Our proposed method outperformed the existing benchmark methods with F1 score of 89%.
Article
Full-text available
This paper explores the knowledge transfer of internationally mobile scientists. It builds upon previous work on the development of methods for detecting the knowledge transfer of German scientists. Using abstract terms of publications covered in Scopus, this paper proposes a lexical‐based approach to identify knowledge transmitters. These scientists are characterized by acquiring knowledge from their co‐workers during their international stay and transferring it upon return to German co‐workers. Knowledge is operationalized as the co‐occurrence of rarely used abstract terms. Knowledge transfer is expressed as the diffusion of these term combinations in co‐authorship networks. The method developed was validated by contacting the bibliometrically identified knowledge transmitters and asking them what they believe they learned during their stay abroad. A control group of internationally mobile scientists without traceable knowledge transfer was similarly asked to report on their knowledge acquisition. The findings suggest that bibliometric data are capable of detecting knowledge transmitters among German scientists who were internationally mobile. The juxtaposition of the responses on their perceived knowledge acquisition and the bibliometrically identified lexical terms shows that the method proposed is well suited to studying the knowledge transfer of internationally mobile scientists. The strength of the method is its simplicity and high precision.
Article
Interdisciplinary research has attracted extensive attention from researchers and policymakers by its nature of integrating various types of knowledge from multiple disciplines to solve complex scientific problems. Besides the studies on citation-based interdisciplinary knowledge flow, recent efforts have been made to demystify the characteristics of knowledge integration in interdisciplinary research from a knowledge content perspective. To deeply understand the knowledge content integrated into interdisciplinary research, two tasks were formulated in this study. One was to identify the knowledge units integrated by an interdisciplinary field, which are defined as integrated knowledge phrases (IKPs) shared between citances and cited texts of the references. The other was to classify the identified IKPs into several knowledge categories, which could reflect their knowledge functions in the field. We proposed a methodology framework to automate the identification and classification of IKPs by using natural language processing techniques and deep learning models. This automatic methodology was tested on an eHealth dataset. The experiments showed that the baseline matching method and the word embedding based similarity matching method are effective for the identification task, and the Bidirectional Encoder Representation from Transformers (BERT) model using section titles and citances as input features achieved the best performance on the classification task, with an accuracy of 0.951. We further showcased the application of IKPs in the case study with expanded literature of eHealth. The two tasks were operated on the new dataset, then co-occurrence networks of IKPs were constructed and mapped to visualize the knowledge integration structure of the field. This study provides a feasible content-based methodology to foster the fine-grained understanding of the knowledge integration structure of an interdisciplinary field, which could become a general domain analysis method.
Article
Citation is an important process of scientific activities, reflecting the inheritance and development of knowledge. However, citations representing different sentiment polarities function differently in knowledge construction, especially negative citations holding critical views, which deserve more in-depth study. This paper selected papers on SVM from 1995 to 2020, and used the stratified random sampling method to obtain 3,337 citation sentences from 46,157 citations, coding several attributes such as citation polarity, to analyze the relationship between negative citation and the impact of cited paper and the role of negative citation in the development of SVM technology. The results of the study found that negative citations do not reduce the literature impact; papers with a certain negative citation ratio would have a higher impact; and the impact of those partially dismissed papers would be even higher. In addition, negative citation presents different characteristics in different periods of the development of SVM, which has a certain promotion effect on the improvement of this technology.
Chapter
Full-text available
A pesquisa aqui relatada teve como objetivo analisar as citações feitas a Donald Schön em artigos da área brasileira de Educação em Ciências. Para tanto, realizamos uma pesquisa bibliométrica do tipo análise de citações. No corpus de nossa pesquisa Schön prevaleceu em citações caracterizadas como orgânicas, conceituais, evolutivas e confirmatórias.
Article
Although citations are widely used to measure the influence of scientific works, research shows that many citations serve rhetorical functions and reflect little-to-no influence on the citing authors. If highly cited papers disproportionately attract rhetorical citations then their citation counts may reflect rhetorical usefulness more than influence. Alternatively, researchers may perceive highly cited papers to be of higher quality and invest more effort into reading them, leading to disproportionately substantive citations. We test these arguments using data on 17,154 randomly sampled citations collected via surveys from 9,380 corresponding authors in 15 fields. We find that most citations (54%) had little-to-no influence on the citing authors. However, citations to the most highly cited papers were 2–3 times more likely to denote substantial influence. Experimental and correlational data show a key mechanism: displaying low citation counts lowers perceptions of a paper's quality, and papers with poor perceived quality are read more superficially. The results suggest that higher citation counts lead to more meaningful engagement from readers and, consequently, the most highly cited papers influence the research frontier much more than their raw citation counts imply.
Article
Citations play a fundamental role in supporting authors’ contribution claims throughout a scientific paper. Labelling citation instances with different function labels is indispensable for understanding a scientific text. A single citation is the linkage between two scientific papers in the citation network. These citations encompass rich native information, including context of the citation, citation location, citing and cited paper titles, DOI, and the website’s URL. Nevertheless, previous studies have ignored such rich native information during the process of datasets’ accumulation, thereby resulting in a lack of comprehensive yet significantly valuable features for the citation function classification task. In this paper, we argue that such important information should not be ignored, and accordingly, we extract and integrate all of the native information features into different neural text representation models via trainable embeddings and free text. We first construct a new dataset entitled, NI-Cite, comprising a large number of labelled citations with five key native features (Citation Context, Section Name, Title, DOI, Web URL) against each dataset instance. In addition, we propose to exploit the recently developed text representation models integrated with such information to evaluate the performance of citation function classification task. The experimental results demonstrate that the native information features suggested in this paper enhance the overall classification performance.
Article
Critical citations are lacking a common definition in relation to extant research on knowledge construction and citation analysis, whereas studies on these topics seem to provide a fully relevant theoretical framework, making criticism an essential phenomenon for the progress of science. We propose to explain this paradox by the fact that a citation seems to have a positive polarity by default and that a polarity shift is the result of a stronger commitment on the part of the author. This results in the use of specific cue words. By studying the labels (and their associated definitions) that 53 other studies equated with the concept of critical citation, we identified 3 functions on which to base the definition of critical citation: "to criticize", "to compare" and "to question" other works. While these studies seem to consider the criticize function as central and probably more frequent, the analysis of a corpus of 51 text snippets containing a citation (all retrieved from those same studies) reveals that the citations considered as critical by these same authors are often comparisons between results rather than blunt attacks against the cited works. This three-function based definition and the set of wordings gathered in this study provide a new basis for the design of tools dedicated to citation polarity detection. Indeed, the lexical and grammatical markers characterizing comparison must be taken into account in addition to those expressing a negative evaluation and those expressing doubt.
Article
Many linguistic studies have analyzed the ways in which reported speech is used to mobilize knowledge in academic writing, but there have been far fewer such studies of knowledge mobilization in non-academic genres. This study analyzes the functions of reported speech in a Canadian quasi-judicial public inquiry report, a genre that is intertextually situated between research genres (through academic expert witnesses) and policy genres (through its role in making policy recommendations to the government). All instances of explicitly marked citation and reported speech in the commission report were identified and coded by function. The findings show citation and reported speech had specific functions that contributed to knowledge mobilization by discursively creating evidence, transporting worldviews and values, and changing knowledge status in the legal genres. The analysis also raises theoretical questions in linguistics, resulting in the argument that reported speech is not a static, formal category but a discursive status negotiated by the participants.
Article
Full-text available
La citation remplit de nombreuses fonctions dans l’article scientifique. Outre sa fonction première, qui consiste à indiquer qu’une réflexion a été empruntée à un autre auteur, elle réalise au moins deux autres objectifs. Elle signale, d’une part, que l’article qui la contient entretient des rapports dialogiques avec un ensemble de textes antérieurs. De l’autre, elle cherche, en établissant des liens avec des travaux antérieurs, à augmenter la force de persuasion de l’article dans lequel elle s’insère. La citation peut, par ailleurs, avoir un autre objectif, qui est, cependant, beaucoup moins étudié. En effet, elle peut viser à rehausser l’efficacité et la précision du langage utilisé pour exprimer le contenu de l’article. La citation retrace et analyse, dans ce cas, des emplois antérieurs d’un terme souvent afin d’en préciser le sens. Ce type de citation, que nous nommons marqueur sémantique (Collet 2016, 2018), se repère, entre autres, dans les articles en sciences humaines et sociales. Notre but est de présenter les types de renseignements métalangagiers ou sémantiques qu’une telle citation est susceptible de contenir. Nous nous servirons à cette fin de données tirées d’un corpus couvrant plusieurs disciplines en sciences humaines et sociales, notamment la linguistique, la psychologie, la sociologie et l’histoire. Nous démontrerons également par une analyse quantitative que le recours aux marqueurs sémantiques est bien établi au sein des domaines susmentionnés.
Article
Citation counts are commonly used to evaluate the scientific impact of a publication on the general premise that more citations probably mean more endorsements. However, two questionable assumptions underpin this idea: a) that all authors contributed equally to the paper; and b) that the endorsement is positive. Obviously, neither of these assumptions hold true. Hence, with this study, we examine two components of citations—their purpose, i.e., the reason for the citation, and polarity, being the author’s attitude toward the cited work. Our findings provide a new perspective on the scientific impact of highly-cited publications. Our methodology consists of three steps. Firstly, a pre-trained model composed of a Word2Vec—a well-known word embedding approach—and a convolutional neural network (CNN) is used to identify citation polarity and purpose. Secondly, in a set of highly-cited papers, we compare eight categories of purpose from foundational to critical and three categories of polarity: positive, negative, and neutral. We further explore how different types of papers—those discussing discoveries or those discussing utilitarian topics—influence the evaluation of scientific impact of papers. Finally, we mine and discover the knowledge (e.g. method, concept, tool or data) to explain the actual scientific impact of a highly-cited paper. To demonstrate how combining citation polarity with purpose can provide far greater details of a paper’s scientific impact, we undertake a case study with 370 highly-cited journal articles spanning “Biochemistry & Molecular Biology” and “Genetics & Heredity”. The results yield valuable insights into the assumption about citation counts as a metric for evaluating scientific impact.
ResearchGate has not been able to resolve any references for this publication.