ArticlePDF Available

Abstract and Figures

We report a method of estimating what percentage of people who cited a paper had actually read it. The method is based on a stochastic modeling of the citation process that explains empirical studies of misprint distributions in citations (which we show follows a Zipf law). Our estimate is only about 20% of citers read the original.
Content may be subject to copyright.
1
Read before you cite!
M.V. Simkin and V.P. Roychowdhury
Department of Electrical Engineering,
University of California, Los Angeles, CA
90095-1594
Abstract. We report a method of estimating
what percentage of people who cited a paper
had actually read it. The method is based on a
stochastic modeling of the citation process that
explains empirical studies of misprint
distributions in citations (which we show
follows a Zipf law). Our estimate is only about
20% of citers read the original
1
.
Many psychological tests have the so-
called lie-scale. A small but sufficient number
of questions that admit only one true answer,
such as “Do you always reply to letters
immediately after reading them?” are inserted
among others that are central to the particular
test. A wrong reply for such a question adds a
point on the lie-scale, and when the lie-score is
high, the over-all test results are discarded as
unreliable. Perhaps, for a scientist the best
candidate for such a lie-scale is the question
“Do you read all of the papers that you cite?”
Comparative studies of the popularity of
scientific papers has been a subject of much
recent interest [1]-[8], but the scope has been
limited to citation distribution analysis. We have
discovered a method of estimating what
percentage of people who cited the paper had
actually read it. Remarkably, this can be
achieved without any testing of the scientists,
but solely on the basis of the information
available in the ISI citation database.
1
Acknowledging the subjectivity inherent in what reading
might mean to different individuals, we generously
consider a “reader” as someone who has at the very least
consulted a trusted source (e.g., the original paper or
heavily-used authenticated databases) in putting together
the citation list.
Freud [9] had discovered that the
application of his technique of Psychoanalysis to
slips in speech and writing could reveal a lot of
hidden information about human psychology.
Similarly, we find that the application of
statistical analysis to misprints in scientific
citations can give an insight into the process of
scientific writing. As in the Freudian case, the
truth revealed is embarrassing. For example, an
interesting statistic revealed in our study is that a
lot of misprints are identical. Consider, for
example, a 4-digit page number with one digit
misprinted. There can be
4
10 such misprints. The
probability of repeating someone else’s misprint
accidentally is
4
10
. There should be almost no
repeat misprints by coincidence. One concludes
that repeat misprints are due to copying some
one else’s reference, without reading the paper in
question.
In principle, one can argue that an author
might copy a citation from an unreliable
reference list, but still read the paper. A modest
reflection would convince one that this is
relatively rare, and cannot apply to the majority.
Surely, in the pre-internet era it took almost
equal effort to copy a reference as to type in
one’s own based on the original, thus providing
little incentive to copy if someone has indeed
read, or at the very least has procured access to
the original. Moreover, if someone accesses the
original by tracing it from the reference list of a
paper with a misprint, then with a high
likelihood, the misprint has been identified and
will not be propagated. In the past decade with
the advent of the Internet, the ease with which
would-be non-readers can copy from unreliable
sources, as well as would-be readers can access
the original has become equally convenient, but
there is no increased incentive for those who read
the original to also make verbatim copies,
especially from unreliable resources
2
. In the rest
of this paper, giving the benefit of doubt to
2
According to many researchers the Internet may end up
even aggravating the copying problem: more users are
copying second-hand material without verifying or
referring to the original sources.
2
potential non-readers, we adopt a much more
generous view of a “reader” of a cited paper, as
someone who at the very least consulted a
trusted source (e.g., the original paper or
heavily-used and authenticated databases) in
putting together the citation list.
As misprints in citations are not too
frequent, only celebrated papers provide enough
statistics to work with. Figure 1 shows
distribution of misprints to one of such papers
[10] in the rank-frequency representation,
introduced by Zipf [11]. The most popular
misprint in a page number propagated 78 times.
Figure 2 shows the same data, but in the
number-frequency format.
As a preliminary attempt, one can
estimate an upper bound on the ratio of the
number of readers to the number of citers, R, as
the ratio of the number of distinct misprints, D,
to the total number of misprints, T:
3
TDR /
. (1)
Substituting 45
=
D and 196
=
T in Eq.(1), we
obtain that 23.0
R . This estimate would be
correct if the people who introduced original
misprints had always read the original paper.
However, given the low value of the upper
bound on R, it is obvious that many original
misprints were introduced while copying
references. Therefore, a more careful analysis is
needed. We need a model to accomplish it.
3
We know for sure that among T citers,
D
T
copied,
because they repeated someone else’s misprint. For the D
others, with the information at hand, we don't have any
evidence that they did not read, so according to the
presumed innocent principle, we assume that they read.
Then in our sample, we have D readers and T citers,
which leads to Eq.(1).
Figure 1. Rank-frequency distribution of misprints
referencing a paper, which had acquired 4300 citations.
There are 196 misprints total, out of which 45 are distinct.
The most popular misprint propagated 78 times. A good fit
to Zipf law is evident.
Figure 2. Same data as in Figure 1, but in the number-
frequency representation. Misprints follow a power-law
distribution with exponent close to 2.
3
Our model for misprints propagation,
which was stimulated by Simon’s [12]
explanation of Zipf Law and Krapivsky-Redner
[4] idea of link redirection, is as follows. Each
new citer finds the reference to the original in
any of the papers that already cite it. With
probability R he reads the original. With
probability 1-R he copies the citation to the
original from the paper he found the citation in.
In any case, with probability M he introduces a
new misprint.
The evolution of the misprint
distribution (here
K
N denotes the number of
misprints that propagated K times, and N is the
total number of citations) is described by the
following rate equations:
N
N
MRM
dN
dN
11
)1()1( ××= ,
N
NKNK
MR
dN
dN
KK
K
××
××=
1
)1(
)1()1(
)
>
(2)
These equations can be easily solved using
methods developed in [4] to get:
γ
KN
K
/1~ ;
)1()1(
1
1
MR ×
+=
γ
. (3)
As the exponent of the number-frequency
distribution,
γ
, is related to the exponent of the
rank-frequency distribution,
α
, by a relation
α
γ
1
1+= , Eq. (3) implies that:
)
1
(
)
1
(
M
R
×
=
α
. (4)
The rate equation for the total number of
misprints is:
N
T
MRM
dN
dT
××+= )1()1( , (5)
The stationary solution of Eq. (5) is:
MR
M
R
M
NT
+
×= . (6)
The expectation value for the number of distinct
misprints is obviously
MND
×
=
. (7)
From Equations (6) and (7) we obtain:
D
N
TN
T
D
R
×= , (8)
Substituting 45
=
D , 196
=
T , and 4300
=
N in
Equation (8), we obtain 22.0
R , which is very
close to the initial estimate, obtained using
Eq.(1). This low value of R is consistent with the
“Principle of Least Effort”[11].
One can ask why we did not choose to
extract R using Equations (3) or (4). This is
because
α
and
γ
are not very sensitive to R
when it is small. In contrast, T scales as R/1 .
We can slightly modify our model and
assume that original misprints are only
introduced when the reference is derived from
the original paper, while those who copy
references do not introduce new misprints (e.g.
they do cut and paste). In this case one can show
that MNT
×
=
and RMND
×
×
=
. As a
consequence Eq.(1) becomes exact (in terms of
expectation values, of course).
Preceding analysis assumes that the
stationary state had been reached. Is this
reasonable? Eq.(5) can be rewritten as:
Nd
RMMRNTM
NTd
ln
)()/(
)/(
=
×+×
. (9)
As long as M is small it is natural to assume that
the first citation was correct. Then the initial
condition is 1
=
N ; 0
=
T . Eq.(9) can be solved to
get:
4
×
×+
×=
×+ RMMR
NRMMR
M
NT
1
1 (10)
This should be solved numerically for R. For
our guinea pig Eq. (10) gives 17.0
R .
Just as a cautionary note, Eq. (10) can be
rewritten as:
×=
x
NxD
T 1
1
1
; RMMRx
×
+
(11)
The definition of the natural logarithm is:
x
a
a
x
x
1
limln
0
=
. Comparing this with Eq. (11)
we see that when R is small (M is obviously
always small):
N
D
T
ln . (12)
This means that a naïve analysis using Eq.(1) or
Eq.(8) can lead to an erroneous belief that more
cited papers are less read.
We conclude that misprints in scientific
citations should not be discarded as mere
happenstance, but, similar to Freudian slips,
analyzed.
We are grateful to J.M. Kosterlitz, A.V.
Melechko, and N. Sarshar for correspondence.
1. Silagadze, Z.K. Complex Syst. 11, 487
(1997); physics/9901035.
2. Redner, S., Eur. Phys. J. B 4, 131 (1998);
cond-mat/9804163.
3. Tsallis, C. and de Albuquerque, M. P., Eur.
Phys. J. B 13, 777 (2000); cond-
mat/9903433.
4. Krapivsky, P. L. and Redner, S., Phys. Rev.
E, 63, 066123 (2001); cond-mat/0011094.
5. Jeong, H., Neda, Z. and Barabasi, A.-L.,
cond-mat/0104131.
6. Vazquez, A., cond-mat/0105031.
7. Gupta, H. M., Campanha, J. R., Ferrari, B.
A., cond-mat/0112049.
8. Lehmann S., Lautrup B., Jackson, A. D.,
physics/0211010.
9. Freud, S., Zur Psychopathologie des
Alltagslebens, (Internationaler
psychoanalytischer Verlag, Leipzig, 1920).
10. Our guinea pig is the Kosterlitz-Thouless
paper (J. Phys. C 6, 1181 (1973)). The
misprint distribution for a dozen of other
studied papers look very similar.
11. Zipf, G. K., Human Behavior and the
Principle of Least Effort: An Introduction to
Human Ecology, (Addison-Wesley,
Cambridge, MA, 1949).
12. Simon, H. A., Models of Man (Wiley, New
York, 1957).
... Some common malpractices, such as failing to consult the primary source, result in chains of errors that are then repeatedly reproduced across various publications (Pavlovic et al. 2021;Simkin and Roychowdhury 2003). Although in-text citation checking services (e.g., originality.ai) ...
... This type of error has received attention from different fields, for example, medical field (Mogull 2017;Pavlovic et al. 2021), but we consider, taking into consideration the range of categories, that it is the category with the least impact on academic integrity, when they are mere errors, but they should be observed to ensure that this is the case. Among the explanations that do have problems with academic integrity are the carry-over of errors (Pavlovic et al. 2021;Simkin and Roychowdhury 2003), which betrays the copying of citations. Although future work could look at other causes, for example, their relationship to journal citation policies. ...
... These reviews may contribute to the perpetuation of citation errors (Pavlovic et al. 2021;Simkin and Roychowdhury 2003), undeservedly granting greater authority to their authors (Greenberg 2009;Oberleiter and Pietschnig 2023 This highlights the need for measurement tools to conduct citation audits, contributing to more transparent, reliable, and replicable research. In this regard, the induced citation checklist (AFQRM) is not only adaptable to multiple fields but also plays a critical role in curbing the normalization of post-truth in science and mitigating the consequent harm to research integrity. ...
Article
Background: The pressure to publish and the dynamics of academic publishing have increased the prevalence of paper mills, citation sales, plagiarism, and academic post-truth, posing a risk to academic integrity. Objective: The aim of this study is to develop and validate the induced citation checklist to measure the risks introduced by key citations. Methods. To develop this tool, key citations were extracted from a systematic review of role-playing games. An inductive and iterative thematic analysis was performed on these citations. The checklist was applied, and the results were summarized in the induced citation graph. Results: The final product, the induced citation checklist, contains five categories, and its application identified widespread issues with citation practices. The most common problem was a lack of empirical foundation. Meanwhile, the induced citation graph provides an intuitive summary of the results. Conclusion: This study highlights the need to consider the biases introduced through citations. In this regard, the induced citation checklist is presented as a valuable tool for improving academic integrity and research practices, and it is simple to apply. The applicability of the checklist extends beyond role-playing games and systematic reviews; therefore, future research should expand its validation across different disciplines.
... This leads to the phenomenon of citing without reading. By analyzing how authors inadvertently copy misprints from references, Simkin and Roychowdhury (2003) estimated that mere 20% of the citers also read the original papers. Inaccurate citations are a common problem, for recent studies show that at least one mistake in the literature list of papers can often be found in the most diverse research fields as biomedicine (15%; Pavlovic et al., 2021), surgery (15%; Sauder et al., 2022), tourism (37%, Moyle et al., 2022). ...
... Even in the field of meta-research, which emerged from efforts to mitigate problems with the quality of scientific output, an apparent lack of critical engagement with the cited literature is observed, leading to incorrect reproductions of claims and overgeneralization of supposed research findings (Horbach et al., 2021). Even if the estimates by Simkin and Roychowdhury (2003) were exaggerated: with the number of new publications growing exponentially, attention becomes an increasingly expensive commodity and concerns about "reading before citing" are expected to become only more serious in the near future. Advances in artificial intelligence such as chatGPT only complicate the picture for they accelerate further paper production (Chen, 2023;Hosseini & Horbach, 2023) and are the basis for tools to summarize scientific papers. ...
... As depicted in Fig. 1, a positive emotional context impacts decision making as a cue indicating similarity and familiarity (Alves et al., 2017;Garcia et al., 2012;Unkelbach et al., 2008;Verde et al., 2010) as well as semantic coherence and conceptual appropriateness (Topolinski & Strack, 2009a). The concentration of + jargon at this palace in the abstract may contribute to increase the feeling of appropriateness of the abstract in the reader and strengthen the motivation to skip reading the whole manuscript thereby opening a door for citation- (Simkin & Roychowdhury, 2003) and interpretation errors so frequent in the literature (Pavlovic et al., 2021;Sauder et al., 2022;Teixeira et al., 2013;Horbach & Aagaard, 2021). ...
Article
Full-text available
Abstracts are the showcase of scientific studies, crafted to make an impression on the reader within a limited space and to determine the amount of attention each study receives. Systemic conditions in the sciences may change the expressive norm and incentive scientists to hype abstracts to promote their work and career. Previous studies found that terms such as “unprecedented”, “novel” and “unique” have been used increasingly in recent history, to describe one’s own research findings. The present study investigates the use of valence-loaded scientific jargon in the abstracts of scientific articles. Sentiment analysis with dictionaries specifically attuned to detect valence-loaded scientific jargon was employed to analyze more than 2,300,000 MEDLINE abstracts from the fields of psychology, biology, and physics. Results show that over the last four decades, abstracts have contained an increasing amount of valence-loaded scientific jargon, as previously observed in earlier studies. Moreover, our results reveal that the positive emotional content of abstracts is increasing in a way that cannot be accounted for by the increase in text length, which has also been observed in the same time period. There were small differences between scientific disciplines. A detailed analysis of the distribution of valence-loaded scientific jargon within abstracts reveals a strong concentration towards the end of the text. We discuss these results in light of psychological evidence relating positive emotions with the propensity to overestimate the value of information to inform judgment and the increase in the competition for attention due to a pressure to publish.
... Factual errors are mainly caused by the authors' lack of full-text reading [27]. However, reading the entire article is recommended before citing it, executed by only 20% of authors [40,41]. Authors are thought to be responsible for reference accuracy [11], and reference errors prevent good-quality publications and result in poor journal reflection [7,27]. ...
Article
Full-text available
Purpose: This study investigated 1) the frequency of quotation errors in multi-authored medical manuscripts in andrology, 2) analyzed common types of quotation errors and the methods used to rectify them, and 3) evaluated their impact on manuscript accuracy, credibility, and research conclusions. Materials and methods: Twelve manuscripts written by the Global Andrology Forum (GAF) members between 2023 and 2024 were randomly selected for this study. The manuscripts and "Quotation Verification Sheets" were analyzed by senior GAF researchers to detect the number and types of quotation errors. The error rate was calculated by the total number of quotation errors and total number of all cited references in each manuscript. The impact on manuscript sections was assessed using a 0-4 grading scale. The Spearman correlation test was used to assess the correlation between scalar variables, and the Mann-Whitney U test was utilized to compare scalar variables between two groups. Results: The median value of quotation errors was 10.3%. Factual inaccuracy was the most common type of error, and was observed in all twelve manuscripts at various rates. The number of errors was significantly associated with the number of references (ρ=0.706; p=0.010) and in-text citations (ρ=0.636; p=0.026). Factual inaccuracy (ρ=0.588; p=0.044) and factual interpretation (ρ=0.861; p=0.013) were also correlated with the total number of quotation errors. However, no significant associations were found between quotation errors and author numbers or their qualifications. The quotation errors adversely impacted the manuscript discussion, followed by the overall message. Conclusions: Quotation errors are common in multi-authored medical manuscripts in andrology-related scientific articles. Journal editorial offices should incorporate quotation verification into the review process. Limiting references and in-text citations to only strictly necessary ones may help improve quotation accuracy. The quotation verification model proposed by GAF offers a practical and structured approach for detecting and correcting quotation errors.
... It was unsure if such citations were, indeed, made by the authors, or due to typesetting errors during article production. Notwithstanding, this finding seemed to be consistent with the estimation that only 20% of citers read the original paper [68], and that 13.3% of references listed at the end of published papers contained errors [69]. ...
Article
Full-text available
The work by Upper (1974) was a blank paper. Multiple replication studies were published. This work examined the number of citations received by these papers, and manually checked the citing papers to determine why they made the citations. The Dimensions literature database was queried with the search string: (unsuccessful treatment writer’s block). The search yielded 14 articles, two of which were irrelevant and excluded. The 12 papers remained after screening included the original study by Upper (1974), nine replication studies, one review, and one meta-analysis. The original work received 43 citations, but related works had fewer than 10 citations each. One fourth of citations of Upper (1974) were being satiric on “nothing” or “precise” from papers dealing with unrelated concepts, and five citations were deemed erroneous/digressed. One citation was made to acknowledge the reviewer’s comments to Upper (1974), which did not involve Upper’s own ideas. This work exposed a scenario where there were limitations of using citation count as the only metric to gauge scientific impact of journal articles.
... We note that specific training is also needed to reduce harm before peer review happens (Fig. 2). For example, only 20% of authors read the papers they cited (Simkin and Roychowdhury 2003) and 25% of citations were inappropriate (Todd et al. 2010), including the continued usage of studies known to be severely flawed (Binning et al. 2018;Berenbaum 2021). ...
Article
Full-text available
This candid perspective written by scholars from diverse disciplinary backgrounds is intended to advance conversations about the realities of peer review and its inherent limitations. Trust in a process or institution is built slowly and can be destroyed quickly. Trust in the peer review process for scholarly outputs (i.e., journal articles) is being eroded by high-profile scandals, exaggerated news stories, exposés, corrections, retractions, and anecdotes about poor practices. Diminished trust in the peer review process has real-world consequences and threatens the uptake of critical scientific advances. The literature on “crises of trust” tells us that rebuilding diminished trust takes time and requires frank admission and discussion of problems, creative thinking that addresses rather than dismisses criticisms, and planning and enacting short- and long-term reforms to address the root causes of problems. This article takes steps in this direction by presenting eight peer review reality checks and summarizing efforts to address their weaknesses using a harm reduction approach, though we recognize that reforms take time and some problems may never be fully rectified. While some forms of harm reduction will require structural and procedural changes, we emphasize the vital role that training editors, reviewers, and authors has in harm reduction. Additionally, consumers of science need training about how the peer review process works and how to critically evaluate research findings. No amount of self-policing, transparency, or reform to peer review will eliminate all bad actors, unscrupulous publishers, perverse incentives that reward cutting corners, intentional deception, or bias. However, the scientific community can act to minimize the harms from these activities, while simultaneously (re)building the peer review process. A peer review system is needed, even if it is imperfect.
... В жёсткой конкурентной борьбе за внимание пользователя на первое место выходят удобство использования системы, глубина предметизации индексируемых объектов для облегчения и уточнения поиска, а также уникальные подходы к анализу больших массивов научной информации с применением новейших технологий искусственного интеллекта, автоматической обработки естественного языка, визуализации результатов анализа и пр. Многие инициативы направлены на решение задачи по более глубокому пониманию природы цитирований, их целей и коннотации, что издавна привлекало [50][51][52] и продолжает привлекать [43; 53-55] внимание исследователей. Если прежние подходы к определению веса цитирований определялись по источнику в целом (например, ссылка из Nature в показателе SJR считается более значимой, чем ссылка из какого-либо нового журнала с низким рейтингом [56]), то в новых подходах вес зависит непосредственно от контекста ссылки. ...
Article
Full-text available
The paper presents a review of function capabilities and coverage of sources in open bibliographic databases that can be useful in the limited access to proprietary information systems. Databases were primarily evaluated with regard to their usefulness for researchers and research libraries who solve the problems of information and patent search, bibliometric assessment of authors, promotion of papers in international information space, searching collaborators or conducting bibliometric studies. We focused on multidisciplinary databases covering wide range of international scientific literature. Based on our own experience and literature review, we concluded on possibility in principle to solve almost all information-retrieval and bibliometric tasks using current open bibliographic databases and their web-tools. Furthermore, large volumes of metadata are now regarded as a basic and non-unique feature of different databases, while analytical characteristics are taking centre stage.
Article
Turizm, modern dünyada ekonomik, sosyal ve kültürel boyutlarıyla önemli bir olgu haline gelmiştir. Ancak bu geniş kapsamlı önemi, turizm araştırmalarında çoğunlukla tek boyutlu ve yanlı yaklaşımlarla ele alınmaktadır. Turizm araştırmacılarının, inceledikleri fenomenle ontolojik bir birlik kurmaktan uzak olduğu gözlemlenmektedir. Çoğu araştırma, tatil deneyimini ve turizm olgusunu bütüncül bir şekilde anlamaya çalışmaktan çok, bu olguyu yüceltici bir tavırla yalnızca işletmecilik açısından olumlu yönlerini öne çıkarmaktadır. Bu yaklaşım, yalnızca yayınlanabilir sonuçlar üretmeye odaklanan ve değişkenler arası korelasyonlara hapsolan bir araştırma paradigmasını beslemektedir. Serinin ikinci makalesinde, turist deneyimi ve turizmi konu edinen turizm araştırmalarının eleştirel bir değerlendirmesi yapılacak, ekonomik perspektifin hakimiyetinin ve COVID-19 pandemisinin turizm araştırmalarında paradigmatik bir dönüşüm gerekliliğini ortaya koyduğu savunulacaktır. Turizm araştırmaları, sıklıkla değişkenler arası ilişkilerin istatistiksel korelasyonlarını incelemekle sınırlı kalmaktadır. Bu "kolerasyonalizm" olarak tanımlanabilecek yaklaşım, turizmi derinlemesine anlamayı engelleyen bir araştırma döngüsü yaratmaktadır. Kolerasyonalizm, varyans dünyasından bir türlü çıkamayan araştırmacıların konuya olgusal (değişkenler arası korelasyon) bakmakla yetinmelerine, gerçek neden-sonuç ilişkilerini ortaya koymaktan çok, yüzeysel bağlantılara odaklanma ve turizmin karmaşık doğasını kavramakta yetersiz kalmalarına neden olmaktadır. Bu yöntemsel sınır, turizmin teorik temellerine katkıda bulunmamakta ve pratikteki sorunları çözmekten uzak kalmaktadır. Örneğin, turist deneyimlerinin yalnızca "mutluluk" veya "tatmin" düzeyleri üzerinden incelenmesi, bu deneyimlerin kültürel, psikolojik ve sosyolojik boyutlarını göz ardı etmektedir. Araştırmalarda sıklıkla kullanılan standartlaştırılmış anketler ve nicel analiz yöntemleri ya da nicelleşmiş nitel çalışma, turizmin bireysel ve toplumsal etkilerini anlamada derinlikten yoksundur. Bu yaklaşım, turizmi anlamaktan ziyade, yalnızca mevcut pratiklere dayalı olarak geçici (palyatif) ve anlamsız bulgular üretmekte ve bu sonuçlar, sektörel veya akademik anlamda uzun vadeli bir değer yaratmamaktadır.
Article
This research delves into the phenomenon of citation errors in academia, focusing on a specific case where renowned behavioural economist George Loewenstein incorrectly attributed a quote to William Stanley Jevons instead of his son Herbert Stanley Jevons. This unique setting serves as a “radioactive tracer” to investigate the presence of intentional laziness in academic practices, as opposed to unintentional errors. We find that research citing Loewenstein’s paper were significantly more likely to make the same mistake than papers that did not. On the other hand, others citing a subsequent paper by Loewenstein—in which he rectified the error—are not subject to those mistakes. Moreover, those who cited additional works by Jevons, regardless of whether they were authored by William S. or Herbert S., were less likely to commit the error. Additionally, scholars who obtained their PhD from higher-ranked institutions were less likely to make the mistake. Interestingly, papers with female authors were less likely to make such a mistake.
Article
Full-text available
The organizational development of growing random networks is investigated. These growing networks are built by adding nodes successively, and linking each to an earlier node of degree k with an attachment probability A(k). When A(k) grows more slowly than linearly with k, the number of nodes with k links, N(k)(t), decays faster than a power law in k, while for A(k) growing faster than linearly in k, a single node emerges which connects to nearly all other nodes. When A(k) is asymptotically linear, N(k)(t) approximately tk(-nu), with nu dependent on details of the attachment probability, but in the range 2<nu<infinity. The combined age and degree distribution of nodes shows that old nodes typically have a large degree. There is also a significant correlation in the degrees of neighboring nodes, so that nodes of similar degree are more likely to be connected. The size distributions of the in and out components of the network with respect to a given node--namely, its "descendants" and "ancestors"-are also determined. The in component exhibits a robust s(-2) power-law tail, where s is the component size. The out component has a typical size of order ln t, and it provides basic insights into the genealogy of the network.
Article
Full-text available
Numerical data for the distribution of citations are examined for: (i) papers published in 1981 in journals which are catalogued by the Institute for Scientific Information (783,339 papers) and (ii) 20 years of publications in Physical Review D, vols. 11-50 (24,296 papers). A Zipf plot of the number of citations to a given paper versus its citation rank appears to be consistent with a power-law dependence for leading rank papers, with exponent close to -1/2. This, in turn, suggests that the number of papers with x citations, N(x), has a large-x power law decay N(x)~x^{-alpha}, with alpha approximately equal to 3.
Article
Full-text available
A key ingredient of current models proposed to capture the topological evolution of complex networks is the hypothesis that highly connected nodes increase their connectivity faster than their less connected peers, a phenomenon called preferential attachment. Measurements on four networks, namely the science citation network, Internet, actor collaboration and science coauthorship network indicate that the rate at which nodes acquire links depends on the node's degree, offering direct quantitative support for the presence of preferential attachment. We find that for the first two systems the attachment rate depends linearly on the node degree, while for the latter two the dependence follows a sublinear power law. Comment: 4 pages, 3 Figures, RevTex
Article
The citation network constituted by the SPIRES database is investigated empirically. The probability that a given paper in the SPIRES database has k citations is well described by simple power laws, P(k) proportional to k(-alpha), with alpha approximately 1.2 for k less than 50 citations and alpha approximately 2.3 for 50 or more citations. A consideration of citation distribution by subfield shows that the citation patterns of high energy physics form a remarkably homogeneous network. Further, we utilize the knowledge of the citation distributions to demonstrate the extreme improbability that the citation records of selected individuals and institutions have been obtained by a random draw on the resulting distribution.
  • C Tsallis
  • M P De Albuquerque
Tsallis, C. and de Albuquerque, M. P., Eur. Phys. J. B 13, 777 (2000); condmat/9903433.
  • S Redner
Redner, S., Eur. Phys. J. B 4, 131 (1998); cond-mat/9804163.
  • H A Simon
Simon, H. A., Models of Man (Wiley, New York, 1957).