Content uploaded by Ronald Rousseau
Author content
All content in this area was uploaded by Ronald Rousseau
Content may be subject to copyright.
1
1
The pure h-index: calculating an author’s h- index
by taking co-authors into account
Jin-kun WAN 1,2, Ping-huan HUA1,2 and Ronald ROUSSEAU 3
1 Library of Tsinghua University, Beijing,100084,China
E-mail: wanjk@lib.tsinghua.edu.cn
2 China Scientometrics and Bibliometrics Research Center
P O BOX 84-48, Tsinghua University ,Beijing,100084,China
E-mail: gfhs@cnki.net
3 KHBO (Association K.U.Leuven), Industrial sciences and Technology,
B-8400 Oostende, Belgium
E-mail: ronald.rousseau@khbo.be
Abstract
We introduce a new Hirsch-type index for a scientist. This so-called pure
h-index, denoted by hP, takes the actual number of co-authors, and the
scientist’s relative position in the byline into account. The transformation from h
to hP can also be applied to the R-index, leading to the pure R-index, denoted
as RP. This index takes the number of collaborators, possibly the rank in the
byline and the actual number of citations into account.
Introduction
The h index proposed by J. E. Hirsch (Hirsch, 2005) combines productivity with
impact. In this article we will not discuss advantages and disadvantages, see
e.g. (Glänzel, Jin et al., 2007) for this aspect, but will propose an adaptation of
the original proposal. This adaptation takes the number of co-authors into
account.
Recall that, when a researcher’s articles are ranked according to the number of
citations received, his or her Hirsch index is h if h is the highest rank (largest
natural number) such that the first h publications received each at least h
citations. The Hirsch core is the set consisting of the first h publications, where,
in case of ties, a choice has to be made. In this article preference is given to
articles with the least number of authors. In other situations preference has
been given to the most recent articles (Jin, 2007; Jin et al., 2007). The Hirsch
core of author A will be denoted by H(A).
Papers belonging to a scientist’s Hirsch core may be written by this author as a
single author or in collaboration with colleagues. The question we want to study
2
2
in this note is: how can the h-index be adapted in order to take account of the
number of collaborators? Indeed, it goes without saying that it is much easier to
get a high h-index when one has written many papers with many collaborators.
We will moreover take an author’s rank in the byline into account and propose a
new index, denoted as hP, for evaluating the so-called pure contribution of a
given author.
The idea of taken the number of co-authors into account has already been
considered by Batista et al. (2006). They simply divide h by the average number
of researchers in the publications of the Hirsch core. Quentin Burrell (2007)
proposes to discount the h-index for career length, multi-authorship and
self-citations. He notes that if discounting is performed before the determination
of the Hirsch core this core itself can be reduced. This is one possible approach.
We will take another approach by first determining the h-index and Hirsch core
in the usual way, and then determining a complementary index. Egghe (2007)
presents a mathematical theory of the h-index (and also of the g-index) in case
of fractional counting (see next section for a definition). He considers fractional
counting of citations as well as fractional counting of publications.
Methods for accrediting publications to authors
In this section we present a short overview of some scoring methods (Egghe et
al., 2000). The number of co-authors of an article is denoted by N. The term
‘normalized score’ is used to indicate that the sum of the scores of all
co-authors is equal to one.
(1) First-author counting (Cole & Cole, 1973)
Only the first of the N authors of a paper receives a credit equal to one. The
other authors do not receive any credit. This method is also known as straight
counting. It has been argued, again and again, that this is not an acceptable
method for assigning credits to authors (Lindsey, 1980).
(2) Total counting
Here, each of the N authors receives one credit. This counting method is also
called normal, or standard counting.
(3) Fractional counting (Price, 1981; Oppenheim, 1998)
Now, each of the N authors receives a score equal to 1/N. This counting method
is sometimes called adjusted counting. Fractional counting has been studied
e.g. in (Burrell and Rousseau, 1995; Van Hooydonk, 1997).
3
3
(4) Proportional or arithmetic counting (Van Hooydonk, 1997)
If an author has rank R in the author list of an article with N collaborators (R =
1, …, N), then she/he receives a score of N+1-R. This score can be normalized
in such a way that the total score of all authors is equal to 1. In this normalized
version the score is: ⎛⎞
−
⎜⎟
+
⎝⎠
211
R
NN.
(5) Geometric counting (Egghe et al., 2000)
If an author has rank R in an article with N co-authors (R = 1,…,N) then she/he
receives a credit of −
2NR
. In its normalized version this score becomes −
−
2
21
NR
N.
(6) Noblesse oblige, cf. (Zuckerman, 1968)
In this approach it is assumed that the most important author closes the list.
She/he receives a credit of 0.5, while the other N-1 authors receive a credit of
1/(2(N-1)) each (this is but one suggestion, among many more that are possible
here). Clearly, this concept makes only sense if an article has more than one
author. In the case of one author this counting method assigns a score of one to
the single author.
We note that methods (4), (5), (6) assume that the rank of the authors in the
byline accurately reflects their contribution. If authors adapt alphabetical
ordering, or take turns in being first and second author, these counting schemes
may not be applied.
The co-author adapted h-index, based on the concept of the equivalent
number of co-authors
In the previous list of scoring methods, only total counting is not normalized.
This method will not be used further as our approach is based on normalized
scores. Also first-author counting will not be considered further. We will now
introduce the concepts leading to the definition of an h-index representing the
so-called pure contribution of an author.
4
4
Definition: the equivalent number of co-authors of author A in document D.
This concept, denoted by NE (A,D) is defined as 1
()
D
SA , where S(AD) denotes
the normalized score of author A in document D.
Clearly, NE(A,D) is at least equal to 1. It has no theoretical upper limit. For a
single-authored article NE(A,D) is always equal to 1. When using fractional
counting NE(A,D) is always equal to N, the actual number of co-authors of the
article. For proportional counting NE(A,D) =
+
+−
(1)
2( 1 )
NN
NR
. This value lies
between (N+1)/2 (for rank 1) and N(N+1)/2 (for rank N). In the case of
geometric counting NE(A,D) = −
−
21
2
N
NR . This values lies between −
−
1
21
2
N
N (rank 1,
which is about 2 for N large) and2 - 1
N (rank N). Finally in the case of noblesse
oblige the most important author (closing the list; and assuming we are not
dealing with a single-authored article) always has an NE(A,D) equal to 2, while
the other authors’ NE(A,D) is 2(N-1). This number is at least equal to 2 (the case
of two authors).
Definition: The equivalent Hirsch core average number of authors
The equivalent Hirsch core average number of authors for author A, denoted as
E(A) is defined as:
∈
=
∑
E
DH(A) (A,D)
(A) N
Eh (1)
Definition: The pure or co-author adapted h-index
We define the pure h-index of author A, denoted by hP(A) as:
∈
==
∑
PE
DH(A)
() (A,D)
(A)
hh
hA h N
E (2)
Clearly, when author A has written all his/her articles in the Hirsch core as sole
author, h(A) = hP(A). In all other cases hP(A) < h(A).
5
5
Some examples
Example 1
Assume that three authors, A, B and C always publish together and always in
the same order, namely B – C – A. Assume further that their h-index is equal to
h. Observe that, because of our assumptions, this h-index must be the same for
these three authors.
What is their pure h-index? If fractional counting is used, their hP-value is still
equal, but it is now reduced to 3
h. If arithmetic counting is applied E(B) = 2,
hence hP(B)= 2
h, E(C) = 3, hence hP(C)= 3
h, and E(A) = 6, leading to
hP(A)= 6
h.
Example 2
Assume that the following Table 1 gives the full publication and citation details
of five authors: V, W, X, Y and Z; authors are given in the order they are
mentioned in the byline. Table 2 gives the details for the calculation of the pure
h-index.
Table 1
Authors V W-V W-X V Z X-Y-Z X-Y-Z V-Y X-Z-W
Citations 10 2 1 5 2 1 2 2 30
Besides the data necessary for calculating h and hP Table 2 also shows the
values of these authors’ R-index, introduced in (Jin et al., 2007). The R-index is
equal to the square root of the sum of the actual number of citations of articles
in the Hirsch core. For author A it is given as shown in formula (3):
∈
=∑
D H(A)
(A) cit(A,D)R (3)
Also this index can be divided by the square root of E(A), leading to an index
denoted as RP (last two rows of Table 2). This new indicator is called a pure
R-index, see formula (4):
6
6
∈
=
∑
DH(A)
P
cit(A,D)
(A) (A)
RE (4)
Table 2. Calculation of hP and RP using fractional and arithmetic counting
Authors V W X Y Z
Citations 10
5
2
2
30
2
1
30
2
1
1
2
2
1
30
2
2
1
h-index 2 2 2 2 2
NE (fract.) 1
1 3
2 3
3 3
2 3
1
NE (prop.) 1
1 6
1.5 2
2 3
3 3
1
E (fract) 1 2.5 3 2.5 2
E (prop.) 1 3.75 2 3 2
hP (fract.) 2 1.26 1.15 1.26 1.41
hP (prop.) 2 1.03 1.41 1.15 1.41
R 3.87 5.66 5.66 2 5.66
RP (fract) 3.87 3.58 3.27 1.26 4.00
RP (prop) 3.87 2.92 4.00 1.15 4.00
Note also that, for author Z, we have given preference to the article with the
least number of authors (here one).
According to the standard h-index, these five authors attain the same score.
Table 3 shows the rankings of these five authors, based on the five other
methods. These different rankings again illustrate that different counting
methods lead to different rankings.
Table 3. Rankings of the five authors of Table 1, according to different h-type
indices.
Authors VWXYZ
hP (fract.) 1 3 5 3 2
hP (prop.) 1 5 2 4 2
R 41 151
RP (fract) 23 451
RP (prop) 3 4 1 5 1
The hP-index, based on fractional counting, ranks these authors as V, followed
by Z, then W and Y (tied) and finally X; hP-index, based on arithmetic counting,
7
7
ranks these authors as V, followed by X and Z (tied), then Y and finally W.
According to the R-index authors W, X and Z score equal (5.66 ≈32 ),
followed by authors V and Y, in that order. This result illustrates the (obvious)
fact that taking actual citations into account gives a different (in our opinion,
better) view on the achievements of these authors. Using the pure R-index, an
indicator that incorporates also the number of collaborators, leads to an even
more refined appreciation.
Additional observations
When fractional counting is used the exact rank occupied by an author does not
play any role. Yet, even then our proposal does not coincide with that by Batista
et al. (2006). We reduce the effect of a large number of authors by taken the
square root. In this way, authors are less ‘punished’ for having collaborated in a
mega-authored, highly-cited article.
It is sometimes possible for an author to obtain a higher hP-value by replacing
an article in the Hirsch core by one outside the core but with less collaborators.
We propose not to allow this, as we only seek to complement the h-index.
Moreover, it would make the procedure considerably more difficult, as many
combinations would have to be tried in order to find the optimal one. The next
example shows that it is indeed possible to increase the hP-value in this way.
Assume that author T has the following author list
Authors A-T A-B-T T T
Citations 3 3 2 1
Then h(T) = 2, E(T) = 2.5, hP(T) = 1.265 and RP(T) = 1.55 ; using fractional
counting. Using arithmetic counting E(T) = 4.5, hP(T) = 0.94 and RP(T) = 1.15.
Considering T’s publications in the order:
Authors A-T T A-B-T T
Citations 3 2 3 1
one could say that h(T) is still equal to 2 (this is, of course not the correct way of
calculating h), E(T) = 1.5 and hP(T) would be 1.633 > 1.265 (fractional counting);
or E(T) = 2 and hP(T) = 1.41 > 0.94 (arithmetic counting). This line of approach
is usually counterproductive for the calculation of the RP-index, as the total
number of citations is lowered, yet in this example RP(T) would be 1.83 > 1.55
(fractional counting); and RP(T) = 1.58 > 1.15 (arithmetic counting). As stated
before, we do not encourage this calculating method.
8
8
Conclusion
We have introduced an adaptation of the h-index, which takes the actual
number of co-authors and the relative position of an author into account. It is a
practical way of discounting the h-index as suggested by Burrell (2007). In real
applications many authors may have the same h-index. Applying a
complementary index such as the pure h-index introduces a method of
discriminating among such authors. The pure R-index, denoted as RP, takes
moreover the number of collaborators, possibly the rank in the byline and the
actual number of citations into account.
It is well-known (Egghe et al., 1999; Burrell, 2007) that different counting
methods lead to different rankings. This is also true in the context of h-type
indices. Hence, the concrete counting method should be determined (and
preferably validated) in advance. When the order of authors in the byline does
not reflect the actual contribution then only fractional counting can be applied.
Acknowledgements
The work presented by Ronald Rousseau in this paper was supported by the
National Natural Science Foundation of China by grant no. 70673019.
References
Pablo D. Batista, Mônica G. Campiteli, Osame Kinouchi, and Alexandre S.
Martinez (2006). “Is it possible to compare researchers with different scientific
interests?,” Scientometrics, volume 68, number 1, pp.179-189.
Quentin Burrell (2007). “Should the h-index be discounted?,” In: W. Glänzel and
A. Schubert (editors). The multidimensional world of Tibor Braun. Leuven: ISSI,
pp. 65-68.
Quentin Burrell, and Ronald Rousseau (1995). “Fractional counts for
authorship attribution: a numerical study,” Journal of the American Society for
Information Science, volume 46, number 2, pp. 97-102.
Jonathan R. Cole, and Stephen Cole (1973). “Social stratification in science”.
Chicago: The University of Chicago Press.
Leo Egghe (2007). “Mathematical theory of the h- and g-index in case of
fractional counting of authorship”. Preprint
Leo Egghe, Ronald Rousseau, and Guido Van Hooydonk (2000). “Methods for
accrediting publications to authors or countries: consequences for evaluation
9
9
studies,” Journal of the American Society for Information Science, volume 51,
number 2, pp. 145-157.
Wolfgang Glänzel (2006).”On the opportunities and limitations of the h-index,”
Science Focus, volume 1, number 1, pp. 10-11 (in Chinese). English version
available at E-LIS: ID-code 9535.
Jorge E. Hirsch (2005). “An index to quantify an individual’s scientific research
output,“ Proceedings of the National Academy of Sciences of the United States
of America, vol. 102, number 46, pp. 16569-16572.
Bihui Jin (2007). “The AR-index: complementing the h-index,” ISSI Newsletter,
volume 3, number 1, p.6.
Bihui Jin, Liming Liang, Ronald Rousseau, and Leo Egghe (2007). “The R- and
AR-indices: complementing the h-index, “ Chinese Science Bulletin, volume 52,
number 6, pp. 855-863.
Duncan Lindsey (1980). “Production and citation measures in the sociology of
science: the problem of multiple authorship,” Social Studies of Science, volume
10, number 2, pp. 145-162.
Charles Oppenheim (1998). “Fractional counting of multiauthored publications,“
Journal of the American Society for Information Science, volume 49, number 5,
p. 482.
Derek de Solla Price (1981). “Multiple authorship,” Science, volume 212, issue
4498, p. 987.
Guido Van Hooydonk (1997). “Fractional counting of multi-authored
publications: consequences for the impact of authors, “ Journal of the American
Society for Information Science, volume 48, number 10, pp. 944-945.
Harriet Zuckerman (1968). “Patterns of name-ordering among authors of
scientific papers: a study of social symbolism and its ambiguity, “ American
Journal of Sociology, volume 74, number 3, pp. 276-291.