Working PaperPDF Available

Abstract and Figures

The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below: 1. Which are the most cited documents in GS? 2. Which are the most cited document types in GS? 3. What languages are the most cited documents written in GS? 4. How many highly cited documents are freely accessible? 4.1 What file types are the most commonly used to store these highly cited documents? 4.2 Which are the main providers of these documents? 5. How many of the highly cited documents indexed by GS are also indexed by WoS? 6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS? 7. How many versions of these highly cited documents has GS detected? 8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received? 9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages? 10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?
Content may be subject to copyright.
Does Google Scholar contain all highly cited documents
(1950-2013)?
Alberto Martín-Martín1, Enrique Orduña-Malea2, Juan Manuel Ayllón1,
Emilio Delgado López-Cózar1
1EC3: Evaluación de la Ciencia y de la Comunicación Científica, Universidad de Granada (Spain)
2EC3: Evaluación de la Ciencia y de la Comunicación Científica, Universidad Politécnica de Valencia (Spain)
ABSTRACT
The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The
objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their
languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions
that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations.
The decalogue of research questions is shown below:
1. Which are the most cited documents in GS?
2. Which are the most cited document types in GS?
3. What languages are the most cited documents written in GS?
4. How many highly cited documents are freely accessible?
a. What file types are the most commonly used to store these highly cited documents?
b. Which are the main providers of these documents?
5. How many of the highly cited documents indexed by GS are also indexed by WoS?
6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of
citations they have received in WoS?
7. How many versions of these highly cited documents has GS detected?
8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have
received?
9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search
engine result pages?
10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of
citations they have received?
To answer these questions, a set of 64,000 documents indexed in Google Scholar has been collected, after performing 64 queries by
year (from 1950 to 2013) using Google Scholar’s advanced search, and collecting the maximum number of records that GS displays
for any given query, which as we know is always 1,000. These 64,000 documents receive 122,245,865 citations in Google Scholar
and 35,182,077 in Web of Science Core Collection.
Full raw data available at: http://dx.doi.org/10.6084/m9.figshare.1224314
KEYWORDS
Google Scholar / Academic Search Engines / Top cited documents / Highly cited documents / Citation Analysis /
Language / Open Access / Editions / Academic SEO / Search Engine Optimization / SERP / Search Engine Result Page /
Web of Science
EC3’s Document Series:
EC3 Working Papers Nº 19
Document History
Version 1.2, Published on November 3, 2014,
Granada
Cite as
Martín-Martín, A.; Orduña-Malea, E.; Ayllón, J.M.; Delgado López-Cózar, E. (2014). Does
Google Scholar contain all highly cited documents (1950-2013)? Granada: EC3 Working
Papers, 19: November 3, 2014
Corresponding author
Emilio Delgado López-Cózar. edelgado@ugr.es
Does Google Scholar contain all highly cited documents (1950-2013)?
2
1. INTRODUCTION
1.1 About this title
The reason behind the title of this work and its structure as questions is not simply a rhetorical device
intended to attract the reader’s attention. It is a genuine statement of intentions, since there is no
absolute empirical certainty that our sample contains all the highly cited documents present in Google
Scholar (GS) at the moment we collected the data. If GS provided a feature that allowed us to sort
documents according to number of citations, as traditional bibliometric databases do (Web of Science
and Scopus), we wouldn’t harbor any doubts about this matter. Since this is not the case, we can not
be completely sure that when we make a query by year of publication in GS, it will show us the 1,000
most cited documents published during that range of years (as we know, 1,000 is the maximum
number of results GS will display for any given query). In short, we are not entirely sure that the data
we collected comprises only highly cited documents in GS, and therefore it is likely that some of these
documents don’t actually belong to the group of ―upper crust‖ documents in GS for each of the years
in the selected range (1950-2013).
Nevertheless, there is strong evidence suggesting that our sample contains a very large portion of the
highly cited documents in GS:
Firstly, in its documentation, GS explicitly declares
1
that the number of citations received by a
document is one of the factors involved in the calculation of the position this document will occupy on
the results page, although they don’t specify the overall weight of this factor in the calculation. A high
correlation between the position documents occupy in the search engine results page (SERP) when
they are sorted by Google Scholar’s default relevance criteria, and the position they occupy when they
are sorted simply by their number of citations (See question 10, Figure 24) would confirm that citation
count is indeed the factor that is given the highest weight in Google Scholar’s ranking algorithm, and
therefore it would be safe to presume that the first positions of a query will always be occupied by the
most cited documents that satisfy said query.
Secondly, we can see other evidences that support the validity of our sample: in order to verify that
the documents in our sample were in fact highly cited documents, we retrieved the top 1,000 most
cited documents on the Web of Science Core Collection for each year in the range 1950-2013 (as of
October the 30th 2014), and compared the two sets of documents for each year. The results showed
that, on average, 81% of the documents in our sample from GS with a link to a WoS record
2
were also
present in the ranking of the top 1,000 most cited documents in WoS. With the WoS dataset, we could
also learn how many highly cited documents in WoS were missing from our GS dataset. In this
respect, the results show that the number of highly cited documents in WoS that are not present in our
GS sample is insignificant. There are only 396 (1.3%) documents in our WoS sample that have
received enough citations to be included among the 30,000 most cited documents in our GS sample,
but that according to their document ID are not present in this sample. Likewise, if we consider the
40,000 most cited documents in our sample, this figure raises to 1,645 (4.1%). As we lower the
citation threshold, this figure obviously increases (See Question 1). This result seems logical for two
reasons:
1
About Google Scholar: How are documents ranked?
http://scholar.google.com/intl/en/scholar/about.html [accessed on October 7th 2014]
2
Collaboration between Google Scholar and Web of Science
http://wokinfo.com/googlescholar/ [accessed on October 24th 2014]
EC3 Working Papers Nº 19
3
a) factor ranking: citations are the main ranking factor but not the only one. Therefore, for
documents with the highest number of citations, the position achieved clearly correlated with
citations. In contrast, in the lower positions, where the number of citations is also lower, the
effect of other ranking factors is more evident.
b) statistical noise: in the first positions, the differences between the documents in terms of
citations are high, so the statistical error must be very large to obtain documents in wrong
positions. However, as we approach the border cut (1,000 documents), the differences
between the documents are smaller, and therefore small errors can result in significant
changes in positions over the lower ranks (especially for positions in the margin 800-1,200).
Lastly, our own experience, gained through the daily observation of hundreds of searches. Usually,
the relevance ranking used by GS is reduced to simply placing the highest cited documents in the first
results pages, with very rare exceptions. This is something anyone can check just by doing a search
in Google Scholar. We encourage researchers to experience this for themselves.
To sum up, in this work we analyse the 1,000 documents that GS retrieves for each one of 64 queries
by year, from 1950 until 2013. Presumably, among them we should be able to find the most cited
documents published in each of those years.
1.2 Citation Classics: Highly Cited Documents
The idea of identifying the most influential documents in science using the number of citations they
generate in the scientific literature was introduced, like many other bibliometric tools, by Eugène
Garfield. On January 3rd 1977, Garfield published an essay entitled ―Introducing Citation Classics: the
human side of scientific papers‖ (1977), which appeared in Current Contents. The candidates for
Citation Classics were selected from a group of 500 most cited papers during the years 1961-1975.
Many of these had been listed before in Current Contents. From 1977 to 1993, 400 Citation Classic
Commentaries were published in Current Contents. The full texts of these mostly one-page articles
are now available in an open access server at http://garfield.library.upenn.edu/classics.html.
From 2001, the Highly Cited Papers were integrated in a new product from Thomson Scientific: the
Essential Science Indicators. Neither Scopus nor other databases have released alternatives to this
product.
What we do have is an extensive scientific literature, published during the last few decades, on the
matter of highly cited documents in different journals, subject areas, institutions or countries
(Oppenheim & Renn 1978; Narin & Frame 1983; Plomp 1990; Glänzel & Czerwon 1992; Glänzel, &
Schubert 1992a-b; Glänzel et al. 1995; Tijssen et al. 2002; Aksnes 2003; Aksnes & Sivertsen 2004;
Kresge et al. 2005; Levitt & Thelwall 2009; Smith 2009; Persson 2010). Recently, the need of ranking
any product of scientific activity according to its citation performance has caused the emergence of
this kind of classifications (top 1%, 10%, 15%). The calculation of percentiles, previously proposed
explicitly by Maltrás (2003), has recently been rediscovered by other authors (Bornmann 2010,
Bornmann & Mutz 2011, Bornmann et al. 2011).
The appearance of Google Scholar opened up new possibilities in this field. Its birth at the end of 2004
signaled a revolution in the way scientific publications were searched, retrieved and accessed (Jacsó,
2005).
From the get-go, GS became not only a search engine for scientific and academic documents, but
also for the citations these documents receive. Although it took five years to get over its ―beta‖ stage,
today we can say without a doubt that GS is not only the largest database of scientific, academic and
technical information in the world (Orduña-Malea et al., 2014, Ortega 2014), but also the richest and
Does Google Scholar contain all highly cited documents (1950-2013)?
4
most varied, since Google’s crawlers systematically parse and process the whole academic web, not
making distinctions based on subject areas, languages, or countries (Ortega 2014). Despite the
limitations of its spiders and processing software, the lack of normalization processes and quality
control filters, GS is an irreplaceable source of global scientific knowledge.
Studies about GS have been limited to: a) explain how it works, its features, limitations, errors, etc.; b)
define its coverage and size; c) compare the number of citations received by documents of a given
subject area in GS, to the citations they receive in other databases; and d) its growth and evolution
over time. However, the study of highly cited documents regardless of their discipline or field has
never been addressed in a comprehensive manner.
Therefore, the objective of this work is to identify the set of highly cited documents in GS and define
their core characteristics: language, file format, and how many of them can be accessed to free of
charge. We will also try to answer some additional questions that - hopefully - shed some light about
the use of GS as a tool for assessing impact through citations.
In short, we intend to answer the following questions:
2. RESEARCH QUESTIONS
1. Which are the most cited documents in GS?
2. Which are the most cited document types in GS?
3. In what languages are the most cited documents in GS written?
4. How many highly cited documents are freely accessible?
a. What file types are the most commonly used to store these highly cited documents?
b. Which are the main providers of these documents?
5. How many of the highly cited documents indexed by GS are also indexed by WoS?
6. Is there a correlation between the number of citations that these highly cited documents have
received in GS and the number of citations they have received in WoS?
7. How many versions of these highly cited documents has GS detected?
8. Is there a correlation between the number of versions GS has detected for these documents,
and the number citations they have received?
9. Is there a correlation between the number of versions GS has detected for these documents,
and their position in the search engine result pages?
10. Is there some relation between the positions these documents occupy in the search engine
result pages, and the number of citations they have received?
EC3 Working Papers Nº 19
5
3. MATERIALS AND METHODS
This longitudinal study describes a set of 64,000 documents indexed in Google Scholar, obtained after
performing 64 queries by year (from 1950 to 2013) using Google Scholar’s advanced search, and
collecting the maximum number of records that GS displays for any given query, which as we know is
always 1,000.
This process was carried out twice, with a few days between the first and the second download
processes. In one case, it was done from a computer connected to our university’s IP range (to obtain
WoS data embedded in GS), and in the other case, from a computer with a normal Internet connection
(to obtain data about open access links unadulterated by our university’s subscriptions). Besides, this
also worked as a reliability check, because we confirmed that the two datasets contained the same
records. These processes took place on the 28th of May and 2nd of June, 2014.
We downloaded the source HTML code for each of the result pages in our queries, parsed them to
extract all the relevant information, and saved it in spreadsheet, which is a format more appropriate for
the analysis of data. The fields extracted were the following (Figure 1):
Publication year: It is the year that was used in the query, and not that contained in the
bibliographical description of the record retrieved.
Rank: The position that each document occupies in the search engine results page of GS.
Full Text: Only marked when GS found a freely accessible version of the document. Then,
some additional fields were obtained:
Domain: The domain where GS has found a full text version of the document.
Link: Link to the full text of the document.
Format: File type of the full text version of the document.
Brackets: Some records display text in square brackets before the title of the document. The
most common occurrences are: ―[BOOK]‖ (the record is a book) and ―[CITATION]‖ (the record
has only been found in the reference list of another document), ―[PDF]‖ and ―[HTML]‖ (to
indicate that the document has been found in those formats).
Title: Title of the document.
Title Link: The URL pointing to where the record has been found (it is not a link to a freely
accessible version of the full text, since the document may be behind a paywall).
Authors Publication Source Year Domain/Publisher: This field contains information
about the authors, publication source, year of publication, and publisher of each document.
However, not all this information is always displayed for all records, and it is usually cropped to
fit one line.
Authors: List of authors. When the author has a public Google Scholar Citations profile,
his/her name includes a link to his/her profile. When there are many authors, only the first
two or three are displayed.
Publication source: Name of the source where the document has been published, and,
sometimes, publication details (volume, issue, pages). This information is not always
displayed, and when it is, it’s not always complete.
Year: year when the document was published. This field has been proved to correspond
with the field ―Publication year‖, previously described.
URL domain / Publisher: Domain where this document has been found, or, sometimes, the
name of its publisher (only for big publishers).
Abstract: First lines of the abstract (it is also cropped to fit a fixed space).
GS Citations: Number of citations the document has received according to GS.
Link to GS Citations: URL pointing to the list of citing documents in Google Scholar.
Link to Related documents: URL pointing to the list of related documents.
Versions: Number of versions GS has found of the documents.
Does Google Scholar contain all highly cited documents (1950-2013)?
6
Link to Versions: URL pointing to the list of versions GS has found of the same document.
Web of Science: This data will only appear if the query is performed from a computer
connected to an IP range with access to Thomson Reuters’ Web of Science, and only for the
documents that are indexed both in GS and WoS.
WoS Citations: Number of citations according to Web of Science.
WoS accession number (UT): identification number of the document in Web of Science.
This code allows us to accurately match a GS record with a WoS record.
WoS Link: URL pointing to the list of citing documents in Web of Science.
Figure 1. Fields extracted from Google Scholar's SERP
In addition to these fields, we added a few more in order to answer our questions related to: type of
the source publication, and language of the document.
Given the difficulty of ascertaining the typologies of the documents indexed in Google Scholar (this
information is not systematically provided by the search engine), we have devised three different
strategies that, combined, have allowed us to know the type of a large portion of documents in our
data set:
a) All documents where the field Brackets = [BOOK]‖ have been considered as books (codified
as ―B‖).
b) For documents that were also indexed in WoS, GS data was merged with WoS data to obtain
the document types. The correspondence is as follows:
Journal (―J‖): ―Article‖, ―Letter‖, ―Note‖, ―Reviews‖.
Book (―B‖): ―Book‖, ―Book Chapter‖.
Conference Proceedings (―C‖): ―Proceedings Papers‖.
Others (―O‖): ―Book Review‖, ―Correction‖, ―Correction, Addition‖, ―Database
Review‖, ―Discussion‖, ―Editorial Material‖, ―Excerpt‖, ―Meeting Abstract‖, ―News Item‖,
―Poetry‖, ―Reprint‖, ―Software Review‖.
c) Lastly, we analysed the publication source (where possible), searching for
keywords that could indicate the type of the source publication:
Journal (―J‖): ―Revista‖, ―Anuario‖, ―Cuadernos‖, ―Journal‖, ―Revue‖, ―Bulletin‖,
―Annuaire‖, ―Anales‖, ―Cahiers‖,‖Proceedings‖
3
.
3
The word ‖Proceedings‖ is used both for journals (i.e. ―Proceedings of the National Academy
of Sciences‖) and for conference proceedings (i.e., Proceedings of the 4th Conference…‖).
Initially, records containing this word in the ―Publication Source” field were all considered as
conference proceedings, but a manual check was carried out to reassign those that were
really journal articles.
EC3 Working Papers Nº 19
7
Conference Proceedings (―C‖): ―Proceedings‖, ―Congreso‖, ―Jornada‖, ―Seminar‖,
―Simposio‖,‖Congrès‖, ―Conference‖, ―symposi‖, ―meeting‖.
Combining these three strategies, we identified the document type for 71% of the 64,000 documents
in our sample. We couldn’t identify the document types for the remaining 29% because this would
have required doing it manually for 18,590 documents, which would have taken an excessive amount
of time. This information was saved in a new field called Source Type, and was codified as follows:
B: Books or book chapters.
J: Journal articles, reviews, letters and notes.
C: Conference proceedings.
O: Others (meeting abstracts, corrections, editorial material…).
Unknown: we haven’t been able to assign a source type (29% of the sample).
As regards the language of the documents (GS doesn’t provide this information either), we used the
language in which the title and abstract of the document were written, as well as WoS data (when
available) as a basis for a new Language field.
In essence, we will show a sectional view (global results) as well as a longitudinal view (results by
year, in order to detect potential changes) of this sample of documents.
The measures we have used to summarise the data are: absolute and relative frequencies of various
aspects of the documents (questions 1-5), and the Pearson correlation (questions 6-10), with p ≤ 0.01.
4. RESULTS
The structure we have followed to present the results of each research question is as follows: first we
describe the results we have obtained, and after that, under a separate heading called ―Discussion &
limitations‖, we lay out and discuss possible inquiries and uncertainties raised by these findings.
Does Google Scholar contain all highly cited documents (1950-2013)?
8
Question 1.
Which are the most cited documents in Google Scholar?
In Table 1 we present the top 25 most cited documents in Google Scholar. Additionally, Appendix A
shows the top 1% most cited documents in our sample (a total of 640 documents).
These lists are a faithful reflection of the all-encompassing indexing policies of Google Scholar: the
academic/scientific/technical world against the scientific world displayed in traditional citation-based
databases. In this respect, we can state that GS offers an original and different vision as regards what
the most influential documents in the academic/scientific world are, from the perspective of their
citation count. This is caused by several reasons:
First, its coverage is not limited to seminal research works in the entire spectrum of scientific fields,
but it also covers greatly influential works directed not only to researchers but also to people who are
training to become researchers or practitioners in their respective fields. This is testified by the
presence of statistical manuals (Handbook of Mathematical Functions with Formulas, Graphs, and
Mathematical Tables; Biostatistical Analysis; Statistical Power Analysis for the Behavioral Science),
laboratory manuals (Molecular cloning: a laboratory manual), manuals of research methodology
(Case study research: Design and methods), and works that have become a de facto standard in
professional practice (Diagnostic and statistical manual of mental disorders, Numerical recipes: the art
of scientific computing; Genetic algorithms in search, optimization, and machine learning).
Second, a high proportion of the highly cited documents are books (a document type that is essential
in the humanities and the social sciences as a vehicle for the communication of new results, and in
the experimental sciences as a way to consolidate and disseminate knowledge). In fact, 62% of the
top 1% most cited documents in our sample are books (see Appendix A). Moreover, books are the
document type with a highest citation average: 2,700, against an average of 1,700 in journal articles,
and 2,200 for conference proceedings. The importance of books and conference proceedings is
therefore thoroughly proven.
Although the ranking is dominated by studies from the natural sciences, and within those, especially
the life sciences, it also contains many works from the social sciences, especially from economics,
psychology, sociology, education… and also from the Humanities (philosophy and history). For
instance: The structure of scientific revolutions; Diffusion of innovations; and Imagined communities.
Reflections on the origin and spread of nationalism).
Many of the works leading this ranking are clearly methodological in nature: they describe the steps of
a certain procedure or how to handle basic tools to process and analyse all kinds of data. Precisely
because they are essential to researchers, they reach such a high number of citations. This
phenomenon is widely known in bibliometrics, where it has already been observed that works that
deal with new data collecting and processing techniques or methodologies are more likely to receive a
great number of citations.
Even though, as we comment before, GS presents a very different ranking of highly cited academic
documents compared to the rankings offered by the traditional citation-based databases, in other
aspects it presents a very similar portrait of the world of research to the one offered in traditional
databases. This is so because the most cited scientific documents in GS match very closely with
those that have been already identified as highly cited in the Web of Science (Garfield, 2005). This
explains the high correlation found in the rankings of documents according to their number of citations
in GS and WoS (See Question 6).
EC3 Working Papers Nº 19
9
Therefore, it is not surprising that the most cited document according to GS is the already famous
article written by Lowry, ―Protein measurement with the Folin phenol reagent” published in 1951 in the
Journal of Biological Chemistry, where he developed a new method to measure the concentration of a
protein in a solution. The reasons for the success of this article were revealed by the author himself
(Lowry, 1977), and in a short note published in the same journal on the occasion of its hundredth
anniversary in 2005 (Kresge et al., 2005).
4
We’ll use this article as an example in the next section to comment some uncertainties and discuss
the possible limitations of these results.
4
See his profile on Google Scholar:
http://scholar.google.com/citations?user=YCS0XAcAAAAJ&hl=es
Does Google Scholar contain all highly cited documents (1950-2013)?
10
Table 1. Top 25 most cited documents in Google Scholar (1950-2013)
Document
type
1st ed.
Pub.
Year
GS
Citations
J
1951
253671
J
1970
221680
J
1976
185749
B
1982
171004
B
1952
129473
B
1986
108956
B
1984
82538
B
1964
80482
B
1962
70662
B
1974
68267
J
1948
66851
J
1987
63871
J
1977
63767
B
1969
63766
B
1967
61158
B
1967
60725
B
1989
59764
EC3 Working Papers Nº 19
11
B
1962
55738
J
1993
54642
J
1988
52316
J
1962
52011
B
1983
51177
J
1975
51150
J
1979
50608
B
1982
50471
Does Google Scholar contain all highly cited documents (1950-2013)?
12
Discussion & Limitations
1. How confident are we that the 64,000 documents that make up our sample really contain the
most cited documents in GS?
Although there are certain evidences that suggest that we have been able to collect the vast
majority of the most cited documents in GS between 1950 and 2013 (as of the 28th of May
2014), as we already explained at the beginning of this study (see Introduction), there are still
some questions that should be cleared up.
To this end, first we have tried to find out if any of the documents in our sample aren’t really
highly cited documents, and second, if there are any highly cited documents that haven’t been
included in our sample. To do this, we have compared the 1,000 most cited papers in GS
against the 1,000 most cited papers in WoS between 1950 and 2013 (Figure 2).
Figure 2. Minimum number of citations received by top cited (1,000, 900, 890, 850) documents
in Google Scholar and WoS (1950-2013)
On the one hand, we have detected that the results displayed by GS to our queries become
extremely erratic in terms of their citation count from about the 900th result onwards. This
means that it is highly probable that approximately the last 100 documents for each year in
our sample (a total of 6,400 documents) aren’t actually highly cited documents, and therefore
should be excluded from the sample.
In contrast, we also have checked that some documents in WoS with a number of citations
that slightly exceed the threshold set by the 1,000 documents returned by GS, are not present
in the first 1,000 results of the search engine.
EC3 Working Papers Nº 19
13
Nonetheless, all these inconsistencies happen in the last 100 positions of each query for each
year, whereas in the first 900 the consistency is high. To sum up, despite the various
limitations described above, we can affirm that the majority of the documents in our sample
are highly cited documents.
2. In order to be able to trust the results that our search strategy yielded, we must ask ourselves
if the documents in our sample were really published in the year GS says they were
published.
To answer this question we carried out two different tests. In the first place, we tested the
internal consistency of the search engine. We checked if the results displayed by GS met the
requirements of our query. We found that the year of publication of the documents according
to GS matched the year we entered in our query in practically 100% of the cases. Only two
records out of 64,000 displayed a different year to the one we typed in the search box.
Secondly, we tested the external consistency. For those documents that had been linked to a
WoS record (32,680 out of 64,000), we compared the publication year according to GS to the
one displayed in the WoS record. Since WoS is a controlled database with a minimum error
rate as regards its bibliographic information, we have used it as a benchmark. The results
showed that the publication years in GS and WoS matched in 96.7% of the cases (31,600
documents). Curiously enough, the years where we detected more mismatches were 2012
and 2013. Consequently, we must conclude that the error rate in the publication years is very
low for this subset of the sample.
Figure 3. Publication year mismatches between journal articles in Google Scholar and Web of
Science
However, we have observed that, in the case of books, Google lumps together all the different
editions of a same book, and systematically selects the latest edition of the book as the
primary version. As a result, GS takes the publication date of the last edition (and not the
publication date of the first edition) as the publication date of the book. This decision, as
understandable as it is from a search point of view (users will probably want to access the
latest edition of a book), obviously affects our sample. In Figure 4, the frequency distributions
for both the publication year of the top 600 most cited books in our sample according to
Google Scholar, and the publication year of the 1st edition of these books are displayed.
Does Google Scholar contain all highly cited documents (1950-2013)?
14
Figure 4. Differences between the publication year of the top 600 most cited books according
to Google Scholar, and the publication year of the 1st edition of these books
In any case, it should be noted that this limitation doesn’t affect the status of these books as
highly cited documents, only the year of publication assigned to them
5
. Moreover, this fact
may be the reason behind the higher number of books in the last five year of the sample (see
Question 2).
3. When some time after collecting our sample, we checked again the number of citations to
Lowry’s article, we were taken by surprise by the result we found. As of the 21st of October,
2014, this study had 192,841 citations according to GS (Figure 5 top). However, on the 28th of
May, 2014, when we collected our sample, this number was 253,671 (figure 5 middle). This
means than within 5 months, Lowry’s article has lost nothing less than 60,000 citations.
Therefore, right now, it is not the highest cited article in GS, giving way to Laemli’s work
(Figure 5 bottom)
5
With the exception, of the book Mathematical theory of communication, a special case study
expanded and commented in Appendix B
EC3 Working Papers Nº 19
15
Figure 5. Citation loss of the most cited document in Google Scholar
and Web of Science (Lowry, 1951)
21st October 2014
28th May 2014
21st October 2014
The debate is served...
How is it possible that the total number of citations of a document decreases over time? What
are the reasons for these changes? Are the results offered by GS concerning citations stable
and reliable, and consequently, the results concerning which the most cited documents are?
There is an explanation for this phenomenon, although it’s difficult to justify that a document
presents a lower number of citations in the present than the number it presented in the past.
The behavior of this document in WoS is more logical, since in these months it has
accumulated a few more citations: as of the end of May 2014, it had 303,832 citations, and on
October the 21st, 2014, it had 305,202 according to GS (Figure 5 top), and 305,248 according
to WoS (Figure 6 bottom). WoS data in GS is updated regularly but not in real time.
Figure 6. Citation of the most cited document in Google Scholar and Web of Science (Lowry,
1951)
Why does this phenomenon occur in GS?
The answer is related to the dynamic nature of the Web: information is added and removed
constantly, and therefore, GS always displays what is currently available on the Web. This is
explained in Google Scholar’s help pages
6
, where they warn that ―Google Scholar generally
reflects the state of the web as it is currently visible to our search robots and to the majority of
users‖. Presumably, this drastic change in citations took place when GS made a major ―re-
crawling‖ of the documents in its database earlier this year (around the third week of June
2014 according to our data).
6
My citation counts have gone down. Help!
http://scholar.google.com/intl/en/scholar/help.html#corrections [accessed on 24th October 2014]
Does Google Scholar contain all highly cited documents (1950-2013)?
16
4. The consequences of this phenomenon in our study are self-evident: did we really collect the
most cited documents?
To this end, we collected the entire sample again on the 4th of October, 2014, and compared
the two samples to learn how many of the documents in our earlier sample are not present in
the new sample (Table 2).
Table 2. Comparison of two samples of 64,000 highly cited documents (May and October,
2014)
Only 14.7% of the 64,000 documents in the most recent sample were not also present in our
earlier sample. Moreover, most of these new documents are placed in pretty low positions in
Google Scholar’s ranking of results.
5. Are we sure that all versions of a same document (not only different editions or reprints, but
also translations to other languages) have been successfully merged, and that all their
respective citations have been added, removing any possible duplicates?
GS has declared that they do exactly this (Verstak & Acharya, 2013), but we don’t have
empirical data to comment on the potential errors regarding this issue.
Nevertheless, it is not difficult to find obvious errors, like the case of the classic work in
Molecular Biology ―Molecular cloning: a laboratory manual‖ (Figure 7), where it is clear that
there are still many different versions with a high number of citations that haven’t been
merged. This, of course, is an exceptional case. Normally, documents will not present as
many versions as this example (See Question 7; Table 7), nor as many citations.
EC3 Working Papers Nº 19
17
Figure 7. A few versions of Molecular cloning: a laboratory manual, by J. Sambrook et al. that
Google hasn’t merged
Lastly, a few well-known issues in bibliometrics (Garfield, 2005) should be kept in mind before
proceeding to observe the ranking of the top 1% most cited documents in Google Scholar (see
Appendix A). First, the citation windows: a document published in 1950 has had 64 years to receive
citations, whereas a document published in 2013 has had only one year. Secondly, the different
paces at which obsolescence takes place in the different scientific fields: generally, documents stop
being cited at some point after their publication date. Thirdly, the exponential growth of production: as
production volumes increase, the number of citations also increases.
Does Google Scholar contain all highly cited documents (1950-2013)?
18
Question 2.
Which are the most cited document types in Google Scholar?
Document types and its evolution
The typologies of the documents in our sample are shown in Figure 8. As we stated in the methods
section, we have been able to determine the typology of 45,410 documents in our sample (71%). The
typologies of the remaining 29% are unknown.
Figure 8. Document types of the highly cited documents in Google Scholar
There is a clear predominance of journal articles, which make up a much higher fraction of the total
than books and book chapters. The presence of conference proceedings is almost non-existent.
Admittedly, this distribution might have been different if we could have defined the document type of
the remaining 29% of our sample.
Figure 9 presents this distribution from a longitudinal perspective, where we find the following three
phenomena:
- A steady decrease over time in the number of documents with an unknown document type.
- A constant increase in the number of books, which become the most frequent document type in
the last five years (2009-2013). As an example, in the 1,000 results for the year 2013, we only find
27 journal articles. What’s the reason for this obvious overrepresentation of the book format over
the rest of the formats in the last years? We believe this phenomenon has very much to do with
the decision of using the most recent edition of a book (and therefore, the most recent publication
date), as the primary version of the document (See Question 1, Figures 3-4). This causes, for
example, that a classic book originally published in 1965, and reprinted over the years with its
latest edition published in 2012, will be considered as having been published in 2012. Since
Google Scholar only presents 1,000 results for any given query, and we only collected information
about the primary versions of the documents, these books are overshadowing other publications
that have really been published in these years.
- Conference proceedings play an insignificant role in this sample, although they achieve greater
presence during the last decade of the twentieth century.
EC3 Working Papers Nº 19
19
Figure 9. Document types of the highly cited documents in Google Scholar, broken down by
years
Citations and document types
Books is the document type with a higher average citations per document (Table 3), followed by
conference proceedings. Journal articles rank third in this list.
Table 3. Citations according to document types
Journals containing highly cited documents (1950 y 2013)
The articles contained in our sample have been published in a total of 3,131 different journals. In
Table 4 we show the list of journals where the majority of articles are concentrated. As it could not be
otherwise, multidisciplinary journals (Science and Nature) are the ones with the higher number of
highly cited journals, followed by the major journals in the natural sciences (Physics and Chemistry).
As regards the social sciences, only economics and psychology journals (American Economic
Review, and Econometrica) are capable of reaching prominent positions.
Does Google Scholar contain all highly cited documents (1950-2013)?
20
Table 4. Top 25 Most frequent journals in the highly cited documents in Google Scholar
Discussions & Limitations
1. Google Scholar does not provide document type information systematically for all its
documents (only for books).
Because of this, we could not determine the document types of the entire data set, since this
would have required a manual inspection of the remaining 18,590 documents. If we did this,
our guess is that the fraction of books and book chapters would increase, since this is the
typology that GS has more trouble identifying.
2. Would the weight of the book format be different over the years, had Google Scholar decided
to take the first edition of books as their primary version?
Without a doubt, yes (see Question 1; Figure 4).
EC3 Working Papers Nº 19
21
Question 3.
In what languages are the most cited documents in Google Scholar
written?
In Figure 10 we show the document distribution according to language. As we can see, English
dominates over the rest of languages as the most widely used language for scientific communication,
accounting for 92.5% of all the documents in our sample. The second and third places are occupied
by Spanish and Portuguese respectively, but neither of them reach even 2% of the total.
Figure 10. Distribution of languages used in the highly cited documents in GS
In Figure 11 we can observe the same data broken down by years. The results for the language
variable are much more stable through the years than the ones found for the document types. In this
case, the English language predominates in every year, with an oscillation between its maximum and
minimum value of less than 10% (87% in 2013, and 95% in 1991).
Figure 11. Distribution of languages in the highly cited documents in GS by years of
publication
The ―Others category includes the following languages: Italian, Swedish, Indonesian, Finnish,
Danish, Bulgarian, Polish, Norwegian, Turkish, Latin, Slovenian, Serbian, Dutch, Macedonian,
Malayan, Japanese, Czech, Estonian, Slovak, Mongolian, Catalan, Croatian, Lithuanian, and
Ukrainian.
Does Google Scholar contain all highly cited documents (1950-2013)?
22
Discussions & Limitations
1. As with document types, Google Scholar does not provide information about the languages in
which the documents it indexes are written.
Because of this, we developed a strategy to determine this information, using WoS data
where possible (around 50% of the cases), and the title and abstract of the document in all
the other cases. This approach, however, may have introduced an overrepresentation of the
English language, since it is usual for a document written in a language other than English to
provide its title and abstract in English as well, for the purpose of being indexed in
international databases.
2. Additionally, our sample may contain records that are in fact translations of other documents
(which may also be present in our sample).
As we pointed out in previous studies (Martín et al. 2014), Google Scholar usually fails to
group together different translations of a same document. This is the case of journals that are
published both in English and in other language, or books that are translated into various
languages (see Figure 12). This issue has an immediate effect for the works affected by this
problem: their citations are scattered across different records, and this could affect their status
as highly cited documents.
Figure 12. Example of language versions (Chinese, English, German, Spanish, French)
The structure of scientific revolutions, by Kuhn
EC3 Working Papers Nº 19
23
Does Google Scholar contain all highly cited documents (1950-2013)?
24
EC3 Working Papers Nº 19
25
Question 4.
How many highly cited documents are freely accessible?
The percentage of documents for which Google Scholar provides a freely accessible full text link can
be observed in Figure 13. Over 40% of the documents in our sample provided a full text link, and
these links are mostly concentrated in the last two decades. The lower rate of records with an open
access link in the last four years might be explained by journal’s and publisher’s embargo policies.
Additionally, the high percentage of books in the last 5 years of the sample may influence as well.
Figure 13. Percentage of freely accessible highly cited documents in Google Scholar. Global
results for the 1950-2013 period (left), and broken down by decades (right)
These results are consistent with those published by Archambault et al. in 2013, (since they also
found that over 40% of the articles from their sample were freely accessible from Google Scholar),
and much higher than the results obtained by Khabsa and Giles (2014), and Björk et al. (2010), who
found only a 24% and 20.4% of open access documents respectively.
What file types are the most commonly used to store these highly cited
documents?
Full text links point to documents in a variety of formats. The most common one is the PDF format,
followed by the HTML format. Figure 14 presents the distribution of these formats for all the
documents that provide a Full Text Link. These results confirm the data previously identified, among
others, by Aguillo, Ortega, Fernández & Utrilla (2010) and Orduña-Malea, Serrano-Cobos & Lloret-
Romero, N. (2009).
Does Google Scholar contain all highly cited documents (1950-2013)?
26
Figure 14. File Formats of the highly cited documents in Google Scholar freely accessible
(1950-2013)
Figure 15 shows the same data broken down by years. We can see that the predominance of the
PDF format is present throughout the entire range of years. However, it is also noteworthy that the
HTML format has started gaining more presence for documents published in the last 25 years, with a
peak of almost 20% of the share in 2010.
Figure 15. File Formats of the highly cited documents in Google Scholar that are freely
accessible, broken down by years (1950-2013)
EC3 Working Papers Nº 19
27
Which are the main providers of these documents?
We have found a total of 5,715 different providers of Full Text Links in our sample. However, a group
of 35 providers account for more than a third of all the links. Table 5 shows these main providers.
Table 5. Full Text provider
If we analyse the top-level domains of these links, the most frequent are academic institutions (.edu)
and organizations (.org). Moreover, the number of links provided by academic institutions is probably
higher than 6,136, because there are many universities that use national top-level domains instead of
.edu. Table 6 shows the 20 most frequent top-level domains.
Does Google Scholar contain all highly cited documents (1950-2013)?
28
This means that GS feeds highly cited documents mainly, at least as far as our sample is concerned,
from universities (institutional repositories) and public organizations (working papers, grey literature),
and not from commercial publishers. Of special note is the role of the scientific social network
ResearchGate, where researchers often upload their publications.
Table 6. Main top-level domains contributing Full Text links in Google Scholar
EC3 Working Papers Nº 19
29
Discussions & Limitations
1. Do these links really point to full text versions of the documents?
More rigorous analyses should be carried out in order to determine if there are false positives
among these links. For example, a freely accessible PDF document containing a review of a
book, or just the cover and the table of contents of a book could be mistaken for the book
itself.
Moreover, the dynamic nature of the web means that a link that was accessible some time
ago may no longer be available. How often does Google Scholar checks that these links are
still functioning properly?
2. Our analysis deals only with the full text link provided for the version of the document GS
considers as the primary version.
However, when the primary version of a document is not freely accessible, GS points the user
to any other free version if available. Figure 16 is an example of a case where the primary
version is the publisher’s edition of a journal article, but the Full Text link is a preprint from
arXiv). Figure 16. Primary version, Publisher and Full Text provider
3. For documents with more than one version, there may be more than one full text version of
the document.
These versions may be hosted in other domains. Again, we want to stress that we only study
the Full Text Links displayed for the primary versions of the documents.
Does Google Scholar contain all highly cited documents (1950-2013)?
30
Question 5.
How many of the highly cited documents indexed by GS are also
indexed by WoS?
Almost half of the highly cited documents according to Google Scholar are not indexed on the Web of
Science (Figure 17).
Figure 17. Percentage of highly cited documents in Google Scholar that are also indexed in the
Web of Science (1950-2013)
This is extremely relevant, although the following issues should be taken into consideration:
- The different natures of GS and WoS as databases: GS covers academic documents (scientific,
technical, educational…) published by all kinds of different sources and in all sorts of
communication channels (books, theses, reports…), whereas the coverage in Web of Science
Core Collection is oriented towards a more limited range of academic publications, i.e. journal
articles and conference communications. This would confirm our hypothesis that GS measures a
different kind of impact than the one measured by scientific databases: the academic impact.
- If we want to identify the most influential documents in the academic-scientific sphere, we must
use GS.
- GS also identifies the most relevant scientific documents with a fair amount of reliability.
Furthermore, no significant differences are appreciated between 1950 and 2003 (Figure 18).
However, the last decade suffers the consequences of the phenomenon we encountered in question
2: the overrepresentation of books in the last years caused by Google Scholar’s policy of taking the
latest edition of books as their primary version.
Since Web of Science’s coverage of books is still very limited, it is not surprising that the reduction in
the percentage of documents indexed in WoS in the last years closely matches the reduction in the
number journal articles during the same years (Figure 9).
EC3 Working Papers Nº 19
31
Figure 18. Percentage of highly cited documents in Google Scholar that are also indexed in the
Web of Science, broken down by decades (1950-2013)
Discussions & Limitations
1. Is the GS-WoS connection correctly implemented?
A more in-depth study should be carried out to determine potential flaws in the matching of
documents and the frequency with which they occur:
False positives: a document in GS matched to a document in WoS even if they’re not
really the same documents. For example, a book in GS might be matched to a review of
that book indexed in WoS. This is the case of the book ―The discovery of grounded
theory: Strategies for qualitative research‖, which was previously presented in Table 1.
False negatives: documents indexed both in GS and WoS for which a connection hasn’t
been established.
As a first approximation, we have selected the 398 most cited WoS documents between 1950
and 2013 that, according to their WoS ID (accession number), weren’t present in our GS
sample. We have searched the titles of these documents on Google Scholar and found that
382 (96%) were in fact indexed in Google Scholar, and 300 of them were also connected to a
different WoS record.
Therefore, these mistakes arise from incorrect connections between Google Scholar and Web
of Science records, caused by the existence of various records with the same name in WoS.
For example, a case where a document in Google Scholar has been connected to the
Correction of an article in WoS, and not to the article itself is shown in Figure 19.
Does Google Scholar contain all highly cited documents (1950-2013)?
32
Figure 19. Incorrect connections between Google Scholar and Web of Science
records
2. Is it possible that some highly cited articles according to the Web of Science are not indexed
on Google Scholar?
As noted earlier in question 1, this may have happened in a very few cases, but not among
the very highly cited (30,000 most cited documents in our sample).
3. The overrepresentation of books in the last decade
Again, this is one of the flaws in our sample, since it has caused that many journal articles
published in those last years of the sample (2003-2013) and that have received many
citations, are being left out in favor of books that were first published many years ago.
EC3 Working Papers Nº 19
33
Question 6.
Is there a correlation between the number of citations that these
highly cited documents have received in GS and the number of
citations they have received in WoS?
We have calculated Pearson’s correlation coefficient for the number of citations that documents have
received according to Google Scholar and the Web of Science, by year. The average correlation is
0.8 (calculated only for documents that are in both sources, which are 32,680). Figure 20 shows the
Pearson correlation coefficient for each of the years in our sample.
Figure 20. Pearson correlation coefficient between Google Scholar and Web of Science
citations (1950-2013)
This finding is consistent with the results found in many previous studies (Sanderson 2008; Kousha, &
Thelwall 2008; Meho & Rogers 2008; Franceschet, 2010; Delgado López-Cózar & Cabezas 2013;
Delgado López-Cózar & Repiso 2013), who also found a high correlation among the journal indicators
published by Google Scholar/Google Scholar Metrics and the Web of Science/Journal Citation
Reports. However, none of these studies had analysed a sample as large as this one (32,680
documents).
It is common among the studies that compare Google Scholar and the Web of Science to quantify the
number of citations they have been able to find for the documents they index. In our sample, 91.6% of
the documents have received more citations in GS than in WoS. Only 3,079 documents (9.4%) have
more citations according to WoS than in GS. Furthermore, the average number of citations per
document in GS is 1.79, and 1.08 in WoS, which means that on average, GS has 70% more citations
per document than WoS.
Does Google Scholar contain all highly cited documents (1950-2013)?
34
Discussions & Limitations
1. As in question 5, the quality of the matching between GS and WoS plays an important part.
2. The instability of Google Scholar’s indicators is also an important factor and should be further
analysed.
As an example, Lowry’s classic article had 253,671 citations at the end of May, 2014, when
we collected the data (see Table 1), but on August the 5th the count had went down to
191,669 (Figure 21). WoS data seems to be much more stable, but it also went down from
304,893 citations in May, to 304,667 in August (See also Question 1, Figure 5).
Figure 21. The most cited scientific article in history, according to Google Scholar (top), and
WoS (bottom). Screen capture from 7th of August, 2014
EC3 Working Papers Nº 19
35
Question 7.
How many versions of these highly cited documents has GS
detected?
One of the most interesting features of Google Scholar as an academic search engine is its ability to
identify and connect all the different instances of the same document that have been deposited
across the Web. We should bear in mind that a document can be stored in various locations: the
journal publisher’s webpage (Cell), databases (Pubmed), aggregators (Ingenta), library catalogues
(Dialnet), subject or institutional repositories, and authors’ personal or institutional web pages.
Moreover, documents might go through various versions and revisions, and they can be cited in
different forms. Google acknowledges this reality and tries to find a solution.
Excerpt from Verstak, AA and Acharya, A (2013). Identifying multiple versions of documents. U.S.
Patent No. 8,589,784. Washington, DC: U.S. Patent and Trademark Office:
“[…] it is typical that a particular document or portion thereof, appears
in a number of different versions or forms in various online repositories.
This generally results in multiple versions of a document being included
in the search results for any given query. Because the inclusion of
different versions of the same document does not provide additional
useful information, this increase in the number of the search results
does not benefit users. Also, search results including different versions
of the same document may crowd out diverse contents that should be
included. These problems have seriously affected the quality of a
search result provided by a search engine.
Another problem arises in systems in which there are multiple versions
of documents present. Documents in a document collection will have a
number of citations to it by other documents. This is particularly the
case for academic documents, legal documents, and the like. The
number of citations (citation count) to a document is often reflective of
the importance, significance, or quality of the document. Where there
are different versions of a document present in a repository, each with
its own citation count, a user does not have an accurate assessment of
the actual significance, importance or quality of the document based on
the individual citation counts.
For these reasons, it would be desirable to identify documents that are
different versions of the same document in a document collection. It
would also be desirable to manage these documents in an efficient
manner such that the search engine can furnish the most appropriate
and reliable search result.”
83% of the documents in our sample have more than one version, whereas 40% have 6 or more
versions, 19% have 10 or more versions, and 200 documents have more than 100 versions (0.1%).
The distribution of documents according their number of versions can be observed in Table 7:
Does Google Scholar contain all highly cited documents (1950-2013)?
36
Table 7. Distribution of documents according to their number of versions
Discussions & Limitations:
1. Does GS correctly identify all versions of a same document? Does it make mistakes, like
linking a document with a different document (i.e., a review of that document, or a citation
found in the list of references of another document), or failing to connect two records that
refer to the same document? How frequently does it make these mistakes?
In order to successfully answer these questions, we would need to analyse a sample of
documents and study all their versions individually. While we carry out this study, we present,
by way of an example, an illustrative example in Appendix B.
EC3 Working Papers Nº 19
37
Question 8.
Is there a correlation between the number of versions GS has
detected for these documents, and the number citations they have
received?
Using Pearson’s correlation coefficient, we have been able to determine that there is no correlation
whatsoever between the number of citations of a document in Google Scholar and its number of
versions (r = 0.2**). Calculating it by year of publication yields similar results (Figure 22).
Figure 22. Pearson's correlation between the nº of citations and nº of versions in Google
Scholar documents (64,000 most cited documents in Google Scholar; 1950-2013)
Does Google Scholar contain all highly cited documents (1950-2013)?
38
Question 9.
Is there a correlation between the number of versions Google
Scholar has detected for these documents, and their position in the
result pages?
Using Pearson’s correlation coefficient, we also have determined that there is no correlation
whatsoever between the number of versions of a document in Google Scholar and the position it
occupies in the search engine results page (Figure 23). The average correlation for the results we
collected from 64 queries is r = -0.2**.
Figure 23. Pearson's correlation between the number of versions of the documents in Google
Scholar and their rank in the SERP
EC3 Working Papers Nº 19
39
Question 10.
Is there some relation between the positions these documents
occupy in the search engine result pages, and the number of
citations they have received?
After calculating the Pearson correlation for each of the years in our queries, we obtained an average
r = 0.9** (Figure 24). These results confirm that the most important factor in the calculation of the
position a document will occupy in Google Scholar’s SERP is its citation count, confirming the
statement of Google Scholar in this regard.
Figure 24. Pearson correlation between the number of citations of documents in Google Scholar
and the position they occupy in the Search Engine Result Page
Moreover, according to the scatterplot in Figure 25, the correlation is almost perfect until we reach the
last 100 results of the queries, but then the correlation becomes much more tenuous. If we calculate
the Pearson correlation for the first 900 and the last 100 results of each query separately, the average
correlation for all years is 0.97** and 0.61** respectively. Clearly, the problem is restricted to the tail of
the distribution.
Does Google Scholar contain all highly cited documents (1950-2013)?
40
Figure 25. Relationship between the number of citations of documents in Google Scholar and
the position they occupy in the Search Engine Result Page
EC3 Working Papers Nº 19
41
5. CONCLUSIONS
As we’ve seen, the analysis of GS provides a very different vision to the question of which are the
most influential academic, scientific and technical documents for the scientific, professional and
educational community. This fact can be explained by Google Scholar’s own nature:
Google Scholar’s crawlers sweep the entire academic web: the most well-known scholarly
publishers (such as Elsevier, Springer, Sage, Willey, Taylor & Francis, IEEE, ACS, ACM,
Macmillan, Wiley, Oxford University Press); their digital hosts/facilitators (such as HighWire
Press, MetaPress, Ingenta); societies and other scholarly organizations (such as the
American Physical Society, American Chemical Society, ACM), government agencies
(National Institute of Health, National Oceanic and Atmospheric Administration, U.S.
Geological Survey), databases (Pubmed, ERIC), disciplinary repositories (such as arXiv.org,
Astrophysics Data System, RePEc, SSRN, CiteBase), institutional repositories from
universities or research centers, library catalogs (Dialnet), as well as personal web pages
from researchers, professors, research groups, departments, faculties… hosted inside the
servers of the university or research center they belong to.
While traditional citation-based databases deal with the strictly scientific world (mainly journal
articles, conference communications, and some books), Google Scholar’s aim is to index all
kinds of scientific documents (scientific and professional journals, conferences, books,
working papers, reports…), as well as educational documents (master’s and doctoral theses,
teaching materials…), and technical and professional documents (reports, patents, american
case laws, annuals…) circulating in the Web.
It covers documents written in all languages and from all countries.
In conclusion, thanks to the wide and varied sources from which GS feeds, we are able to measure
not only scientific impact, but also educational and professional impact in the broadest sense of the
term (Kousha and Thelwall, 2008).
At the same time, as regards strict scientific impact, the analysis of GS data provides very similar
results to the results obtained from traditional citation-based databases, with the advantage of being
able to retrieve a larger and more varied number of citations, since they come from a wider range of
document types, different geographical environments, and languages different to English.
The profile of the average highly cited document is: a book or journal article written in English and
available online in PDF format.
The rest of the findings of this study can be summarised as follows:
40% of the highly cited documents in GS are freely accessible, mostly from educational
institutions (mainly universities), and other non-profit organizations. The availability of these
documents is essential for GS as a search engine.
Almost half of these highly cited documents are not indexed in Web of Science, which for
many years has has been considered the most prestigious scientific information database.
There is a high correlation (r = 0.8) between the number of citations of these documents in
GS and their citations in WoS.
GS has detected more than one version for the 83.17% of the documents in our sample.
There is no correlation between the number of versions GS has detected, and the number
citations they have received.
There is no correlation between the number of versions GS has detected for these
documents, and their position in the result pages (SERPs).
Does Google Scholar contain all highly cited documents (1950-2013)?
42
There is a high correlation (r = 0.9) between the positions these documents occupy in the
result pages and the number of citations they have received, at least in queries that only use
the filtering option to select the documents published in a given year.
EC3 Working Papers Nº 19
43
REFERENCES
Aguillo, I. F., Ortega, J. L., Fernández, M., & Utrilla, A. M. (2010). Indicators for a webometric ranking
of open access repositories. Scientometrics, 82(3), 477-486.
Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation,12(3), 159-170.
Aksnes, D. W., & Sivertsen, G. (2004). The effect of highly cited papers on national citation indicators.
Scientometrics, 59(2), 213-224.
Bornmann, L. (2010). Towards an ideal method of measuring research performance: Some
comments to the Opthof and Leydesdorff (2010) paper. Journal of Informetrics, 4(3), 441443
Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method of measuring citation
performance: the avoidance of citation (ratio) averages in field-normalization. Journal of
Informetrics, 5(1), 228-230.
Bornmann, L., de Moya-Anegón, F., & Leydesdorff, L. (2011). The new excellence indicator in the
World Report of the SCImago Institutions Rankings 2011. arXiv preprint arXiv:1110.2305.
Delgado López-Cózar, E. & Repiso, R., (2013). The Impact of Scientific Journals of Communication:
Comparing Google Scholar Metrics, Web of Science and Scopus. Comunicar, 21(41), 45-52.
Delgado-López-Cózar, E., Cabezas-Clavijo, Á. (2013). Ranking journals: could Google Scholar
Metrics be an alternative to Journal Citation Reports and Scimago Journal Rank?. Learned
Publishing, 26(2), 101-114. DOI: http://dx.doi.org/10.1087/20130206
Franceschet, M. (2010). A comparison of bibliometric indicators for computer science scholars and
journals on Web of Science and Google Scholar.Scientometrics 83.1: 243-258.
Garfield, E. (2005). The Agony and the EcstasyThe History and Meaning of the Journal Impact
Factor. International Congress on Peer Review And Biomedical Publication. Chicago, September
16, 2005. http://www.garfield.library.upenn.edu/papers/jifchicago2005.pdf
Glänzel, W., & Czerwon, H. J. (1992a). What are highly cited publications? A method applied to
German scientific papers, 19801989. Research Evaluation, 2(3), 135-141.
Glänzel, W., & Schubert, A. (1992b). Some facts and figures on highly cited papers in the sciences,
19811985. Scientometrics, 25(3), 373-380.
Glänzel, W., Rinia, E. J., & Brocken, M. G. (1995). A bibliometric study of highly cited European
physics papers in the 80s. Research Evaluation, 5(2), 113-122.
Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better late than never? On the chance to become
highly cited only beyond the standard bibliometric time horizon.Scientometrics, 58(3), 571-586.
Kousha, K., & Thelwall, M. (2008). Sources of Google Scholar citations outside the Science Citation
Index: A comparison between four science disciplines. Scientometrics, 74(2), 273294.
Kresge, N., Simoni, R. D., & Hill, R. L. (2005). The most highly cited paper in publishing history:
Protein determination by Oliver H. Lowry. Journal of Biological Chemistry, 280(28), e25-e25.
http://www.jbc.org/content/280/28/e25.full.pdf
Levitt, J. M., & Thelwall, M. (2009). The most highly cited Library and Information Science articles:
Interdisciplinarity, first authors and citation patterns.Scientometrics, 78(1), 45-67.
Lowry, OH. (1977). Commentary by Lowry, OH on ‖Protein measurement with folin phenol reagent,‖
Current Contents/Life Sciences (1):7 (January 3, 1977).
http://garfield.library.upenn.edu/classics1977/A1977DM02300001.pdf
Maltrás Barba, B. (2003). Los indicadores bibliométricos: fundamentos y aplicación al análisis de la
ciencia. Gijón: Trea.
Martín-Martín, A.; Ayllón, J.M.; Orduña-Malea, E.; Delgado López-Cózar, E. (2014). Google Scholar
Metrics 2014: a low cost bibliometric tool. EC3 Working Papers, 17: 8 July 2014.
http://arxiv.org/ftp/arxiv/papers/1407/1407.2827.pdf
Meho, L. I., & Rogers, Y. (2008). Citation counting, citation ranking, and h-index of human-computer
interaction researchers: A comparison between Scopus and Web of Science. Journal of the
American Society for Information Science and Technology, 59(11), 17111726.
Narin, F., Frame, J. D., & Carpenter, M. P. (1983). Highly cited Soviet papers: An exploratory
investigation. Social Studies of Science, 13(2), 307-319.
Does Google Scholar contain all highly cited documents (1950-2013)?
44
Oppenheim, C., & Renn, S. P. (1978). Highly cited old papers and the reasons why they continue to
be cited. Journal of the American Society for Information Science,29(5), 225-231.
Ortega, JL. (2014). Academic Search Engines: A Quantitative Outlook. Elsevier, Chandos Information
Professional Series
Orduña-Malea, E., Ayllón, J.M., Martín-Martín, A., Delgado López-Cózar, E. (2014). About the size of
Google Scholar: playing the numbers.Granada: EC3 Working Papers, 18: 24 July 2014.
http://arxiv.org/pdf/1407.6239
Orduña-Malea, E., Serrano-Cobos, J., & Lloret-Romero, N. (2009). Las universidades públicas
españolas en Google Scholar: presencia y evolución de su publicación académica web. El
profesional de la información, 18(5), 493-500.
Persson, O. (2010). Are highly cited papers more international?. Scientometrics,83(2), 397-401.
Plomp, R. (1990). The significance of the number of highly cited papers as an indicator of scientific
prolificacy. Scientometrics, 19(3), 185-197.
Sanderson, M. (2008). Revisiting h measured on UK LIS academics. Journal of the American Society
for Information Science and Technology, 59(7), 11841190.
Smith, D. R. (2009). Highly cited articles in environmental and occupational health, 19191960.
Archives of environmental & occupational health, 64(sup1), 32-42.
Tijssen, R. J., Visser, M. S., & Van Leeuwen, T. N. (2002). Benchmarking international scientific
excellence: are highly cited research papers an appropriate frame of reference? Scientometrics,
54(3), 381-397.
EC3 Working Papers Nº 19
45
APPENDIX A
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
J
LOWRY, O.H. et al., (1951). Protein measurement with the Folin
phenol reagent.The Journal of biological chemistry, 193(1),
265-275.
1951
253671
J
LAEMMLI, U.K. (1970). Cleavage of structural proteins during
the assembly of the head of bacteriophage T4. Nature,
227(5259), 680-685. DOI: 10.1038/227680a0
1970
221680
J
BRADFORD, M.M. (1976). A rapid and sensitive method for the
quantitation of microgram quantities of protein using the
principle of protein dye binding.Analytical Biochemistry, 72,
248-254. DOI: 10.1006/abio.1976.9999
1976
185749
B
SAMBROOK, J., FRITSCH, E. F., & MANIATIS, T. (1982). Molecular
cloning: a laboratory manual. New York, Cold Spring Harbor
Laboratory Press.
1982
171004
B
AMERICAN PSYCHIATRIC ASSOCIATION. (1952). Diagnostic and
statistical manual: mental disorders. Washington, American
Psychiatric Assn., Mental Hospital Service.
1952
129473
B
PRESS, W. H. (1986). Numerical recipes: the art of scientific
computing. Cambridge [Cambridgeshire], Cambridge
University Press.
1986
108956
B
YIN, R. K. (1984). Case study research: design and methods.
Beverly Hills, Calif, Sage Publications.
1984
82538
B
ABRAMOWITZ, M., & STEGUN, I. A. (1964). Handbook of
mathematical functions: with formulas, graphs, and
mathematical tables. Washington, Government printing
office.
1964
80482
B
KUHN, T. S. (1962). The structure of scientific revolutions.
Chicago, University of Chicago Press.
1962
70662
B
ZAR, J. H. (1974). Biostatistical analysis. Englewood Cliffs,
Prentice Hall international.
1974
68267
J
SHANNON, C.E. (1948). A mathematical theory of
communication. The Bell System Technical Journal, 27, 379-
423.
1948
66851
J
CHOMCZYNSKI, , & SACCHI, N. (1987). Single-step method of
RNA isolation by acid guanidinium thiocyanate-phenol-
chloroform extraction. Analytical Biochemistry, 162, 156-
159. DOI: 10.1006/abio.1987.9999
1987
63871
J
SANGER F, NICKLEN S, & COULSON AR. (1977). DNA sequencing
with chain-terminating inhibitors. Proceedings of the
National Academy of Sciences of the United States of
America. 74, 5463-7. DOI: 10.1073/pnas.74.12.5463
1977
63767
B
COHEN, J. (1969). Statistical power analysis for the behavioral
sciences. New York, Academic Press.
1969
63766
B
GLASER, B. G., & STRAUSS, A. L. (1967). The discovery of
grounded theory: strategies for qualitative research. New
York, Aldine de Gruyter.
1967
61158
B
NUNNALLY, J. C. (1967). Psychometric Theory. New York ,
McGraw-Hill.
1967
60725
B
GOLDBERG, D. E. (1989). Genetic algorithms in search,
optimization, and machine learning. Reading, Mass,
Addison-Wesley Pub. Co.
1989
59764
Does Google Scholar contain all highly cited documents (1950-2013)?
46
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
B
ROGERS, E. M. (1962). Diffusion of Innovations. Pxiii. 367. Free
Press of Glencoe, New York; Macmillan, New York: London.
1962
55738
J
BECKE, A.D. (1993). Density Functional Thermochemistry III The
Role of Exact Exchange. J. Chem. Phys., 98, 5648-5652. DOI:
10.1063/1.464913
1993
54642
J
LEE, C., YANG, W. & PARR, R.G., 1988. Development of the
Colle-Salvetti correlation-energy formula into a functional
of the electron density. Physical Review B, 37(2), 785-789.
DOI: 10.1103/PhysRevB.37.785
1988
52316
J
MURASHIGE, T. & SKOOG, F. (1962). A revised medium for rapid
growth and bio assays with tobacco tissue
cultures. Physiologia Plantarum, 15, 473497. DOI:
10.1111/j.1399-3054.1962.tb08052.x
1962
52011
B
ANDERSON, B. R. O. (1983). Imagined communities: reflections
on the origin and spread of nationalism. London , Verso.
1983
51177
J
        -
        -
198. DOI: 10.1016/0022-3956(75)90026-6
1975
51150
J
TOWBIN, H., STAEHELIN, T. & GORDON, J. (1979).
Electrophoretic transfer of proteins from polyacrylamide
gels to nitrocellulose sheets: procedure and some
applications. Proceedings of the National Academy of
Sciences of the United States of America, 76(9), 4350-4354.
DOI: 10.1073/pnas.76.9.4350
1979
50608
B
PAXINOS, G., & WATSON, C. (1982). The rat brain in stereotaxic
coordinates. Sydney [etc.], Academic Press.
1982
50471
J
ALTSCHUL, S.F. et al. (1990). Basic local alignment search
tool. Journal of molecular biology, 215(3), 403-410. DOI:
10.1006/jmbi.1990.9999
1990
50437
J
ALTSCHUL, S.F. et al. (1997). Gapped BLAST and PSI-BLAST: A
new generation of protein database search
programs. Nucleic Acids Research, 25(17), 3389-3402. DOI:
10.1093/nar/25.17.3389
1997
50052
J
ZADEH, L.A. (1965). Fuzzy sets. Information and Control, 8(3),
338-353. DOI: 10.1016/S0019-9958(65)90241-X
1965
49496
B
GAREY, M. R., & JOHNSON, D. S. (1979). Computers and
intractability: a guide to the theory of NP-completeness.
San Francisco, W.H. Freeman.
1979
48816
B
RAWLS, J. (1971). A theory of justice. Cambridge, MA, Belknap
Press of Harvard University Press.
1971
48792
J
THOMPSON, J.D., HIGGINS, D.G. & GIBSON, T.J. (1994). CLUSTAL
W: improving the sensitivity of progressive multiple
sequence alignment through sequence weighting, position-
specific gap penalties and weight matrix choice. Nucleic
acids research, 22(22), 4673-4680. DOI:
10.1093/nar/22.22.4673
1994
47907
B
SIEGEL, S. (1956). Nonparametric statistics for the behavioral
sciences. New York, McGraw-Hill.
1956
47805
B
VYGOTSKY, L. S. (1978). Mind in society: the development of
higher psychological processes. Cambridge, Mass, Harvard
University Press.
1978
47664
B
BORN, M., & WOLF, E. (1959). Principles of optics:
1959
47486
EC3 Working Papers Nº 19
47
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
electromagnetic theory of propagation, interference and
diffraction of light. London, Pergamon Press.
B
GOLUB, G. H., & VAN LOAN, C. F. (1983). Matrix computations.
Baltimore, Md, The Johns Hopkins University Press.
1983
47083
J
FOLCH, J. et al. (1957). A simple method for the isolation and
purification of total lipids from animal tissues. J Biol Chem,
226(1), 497-509. DOI: 10.1007/s10858-011-9570-9
1957
45728
J
SHELDRICK, G.M. (2007). A short history of SHELX. Acta
Crystallographica Section A: Foundations of
Crystallography, 64(1), 112-122. DOI:
10.1107/S0108767307043930
2007
45208
B
MILES, M. B., & HUBERMAN, A. M. (1984). Qualitative data
analysis: a sourcebook of new methods. London, Sage
Publications.
1984
45137
J
BARON, R.M. & KENNY, D.A. (1986). The moderator-mediator
variable distinction in social psychological research:
conceptual, strategic, and statistical considerations. Journal
of personality and social psychology, 51(6), 1173-1182.
DOI:10.1037/0022-3514.51.6.1173
1986
44043
B
GREENE, W. H. (1990). Econometric analysis. New York,
Macmillan.
1990
43955
B
TABACHNICK, B. G., & FIDELL, L. S. (1983). Using multivariate
statistics. New York, Harper & Row.
1983
43474
J
KAPLAN, E.L. & MEIER, (1958). Nonparametric Estimation from
Incomplete Observations. Journal of the American
Statistical Association, 53(282), 457-481.
DOI:10.2307/2281868
1958
43293
B
      (1965). Table of integrals, series, and
products. New York, Academic Press.
1965
42948
B
BANDURA, A. (1986). Social foundations of thought and action:
a social cognitive theory. Englewood Cliffs, N.J., Prentice-
Hall.
1986
42791
J
JENSEN, M.C. & MECKLING, W.H. (1976). Theory of the firm:
Managerial behavior, agency costs and ownership
structure. Journal of Financial Economics, 3(4), 305-360.
DOI:10.1016/0304-405X(76)90026-X
1976
42702
B
HAIR, J. F. et al. (1998). Multivariate data analysis . Upper
Saddle River, NJ: Pearson Prentice Hall.
1998
41984
B
FREIRE, , FREIRE, , & FREIRE, (1970). Pedagogy of the
oppressed. New York (N.Y.), Seabury.
1970
41463
B
FELLER, W. (1950). An introduction to probability theory and its
applications. New York, Wiley.
1950
41135
B
FOUCAULT, M. (1977). Discipline and punish: the birth of the
prison. New York, Pantheon Books.
1977
41076
J
PERDEW, J., BURKE, K. & ERNZERHOF, M. (1996) . Generalized
Gradient Approximation Made Simple. Physical review
letters, 77(18), 3865-3868. DOI:
10.1103/PhysRevLett.78.1396
1996
40868
B
HOLLAND, J. H. (1975). Adaptation in natural and artificial
systems an introductory analysis with applications to
biology, control, and artificial intelligence. Ann Arbor,
University of Michigan Press
1975
40031
Does Google Scholar contain all highly cited documents (1950-2013)?
48
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
B
GIANNETTI, F., & LUISE, M. (2007). Spread Spectrum Signals for
Digital Communications. In : Handbook of Computer
Networks: Key Concepts, Data Transmission, and Digital and
Optical Networks, Volume 1, 675-691.
2007
39891
J
SHELDRICK, G.M. et al. (1993). The application of direct
methods and Patterson interpretation to high-resolution
native protein data. Acta crystallographica. Section D,
Biological crystallography, 49(Pt 1), 18-23. DOI:
10.1107/S0907444992007364
1993
39807
B
LINCOLN, Y. S., & GUBA, E. G. (1985). Naturalistic inquiry.
Beverly Hills, Calif, Sage.
1985
37883
J
LIVAK, K.J. & SCHMITTGEN, T.D. (2001). Analysis of relative gene
expression data using real-time quantitative PCR and the
2(-Delta Delta C(T)) Method. Methods (San Diego, Calif.),
25(4), 402-408. DOI: 10.1006/meth.2001.1262
2001
37688
B
LAVE, J., & WENGER, E. (1991). Situated learning legitimate
peripheral participation. Cambridge, England, Cambridge
University.
1991
37459
J
DEMPSTER, A., LAIRD, N.M. & RUBIN, D.B. (1977). Maximum
Likelihood from Incomplete Data Via Em Algorithm. Journal
of the Royal Statistical Society Series BMethodological,
39(1), 1-38.
1977
37353
B
SZE, S. N. (1969). Physics of semiconductor devices. New York ,
J. Wiley and Sons.
1969
37134
B
STRAUSS, A., & CORBIN, J. (1990). Basics of qualitative research:
grounded theory procedures and techniques. Newbury
Park, Sage.
1990
36986
J
COX, D.R. (1972). Regression models and life tables. Journal of
the Royal Statistical Society. Series B:, 34(2), 187-220.
1972
36953
B
SENGE, M. (1990). The fifth discipline: the art and practice of
the learning organization. New York, Doubleday/Currency.
1990
36478
J
SAITOU, N. & NEI, M. (1987). The neighbor-joining method: a
new method for reconstructing phylogenetic
trees. Molecular biology and evolution, 4(4), 406-425.
1987
36207
B
    The reflective practitioner how
professionals think in action. New York, Basic Books.
1983
35852
B
JACKSON, J. D. (1962). Classical electrodynamics. New York,
Wiley.
1962
35849
J
BLIGH, E.G. & DYER, W.J. (1959). A rapid method for total lipid
extraction and purification. Canadian Journal of
Biochemistry and Physiology 37, no. 8: 911-917.
1959
35095
B
CORMEN, T. H., LEISERSON, C. E., & RIVEST, R. L.
(1990). Introduction to algorithms. Cambridge, Mass, MIT
Press.
1990
35050
B
COVER, T. M., & THOMAS, J. A. (1991). Elements of information
theory. New York, J. Wiley.
1991
34674
B
HAYKIN, S. S. (1994). Neural networks a comprehensive
foundation. New York, Macmillan.
1994
34522
J
KOTLER, (2011). Reinventing Marketing to Manage the
Environmental Imperative. Journal of Marketing, 75(4),
132-135.
2011
34479
B
WINER, B. J. (1962). Statistical principles in experimental design.
1962
34118
EC3 Working Papers Nº 19
49
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
New York, McGraw-Hill.
J
BARNEY, J. (1991). Firm Resources and Sustained Competitive
Advantage.Journal of Management, 17(1), 99-120.
DOI:10.1177/014920639101700108
1991
33976
B
VAPNIK, V. N. (1995). The nature of statistical learning theory.
New York, Springer-Verlag.
1995
33506
B
HOFSTEDE, G. (1980). Culture's consequences: international
differences in work-related values. Beverly Hill, Sage
Publications.
1980
33340
B
HOSMER, D. W., & LEMESHOW, S. (1989). Applied logistic
regression. New York, John Wiley & Sons.
1989
33306
B
CRESWELL, J. W. (1994). Research design: qualitative and
quantitative approaches. Thousand Oaks, Calif, Sage.
1994
33111
J
BANDURA, A. (1977). Self-efficacy: toward a unifying theory of
behavioral change.Psychological review, 84(2), 191-215.
DOI: 10.1037/0033-295X.84.2.191
1977
33038
B
GEERTZ, C. (1973). The interpretation of cultures selected
essays. New York, Basic Books.
1973
33003
B
BERGER, L., & LUCKMANN, T. (1966). The social construction of
reality a treatise in the sociology of knowledge. Garden
City, N.Y., Doubleday.
1966
32710
J
KOHN, W. & SHAMh, L.J. (1965). Self-consistent equations
including exchange and correlation effects. Physical Review,
140(4A). DOI:10.1103/PhysRev.140.A1133
1965
32699
B
CHEMICAL RUBBER COMPANY (CLEVELAND, OHIO). (1913). CRC
Handbook of chemistry and physics: a ready-reference
book of chemical and physical data. Cleveland, Chemical
Rubber Co.
1913
32542
B
BANDURA, A. (1997). Self-efficacy: the exercise of control. New
York, W. H. Freeman.
1997
32393
J
IIJIMA, S. (1991). Helical microtubules of graphitic
carbon. Nature, 354(6348), 56-58. DOI:10.1038/354056a0
1991
32338
B
GOFFMAN, E. (1959). The Presentation of Self in Everyday Life.
New York, Doubleday Anchor Books.
1959
32251
B
BOX, G. E. , & JENKINS, G. M. (1970). Time series analysis
forescasting and control. San Francisco, Holden-Day.
1970
32139
B
GAMMA, E. et al. (1994). Design patterns: elements of reusable
object-oriented software. Reading, Mass, Addison-Wesley.
1994
32067
J
SOUTHERN, E. (1975). Detection of specific sequences among
DNA fragments separated by gel electrophoresis. Journal of
Molecular Biology. 98, 503-517.
1975
31950
B
PORTER, M. E. (1980). Competitive strategy: techniques for
analyzing industries and competitors. New York, Free Press.
1980
31532
B
WILLIAMSON, O. E. (1985). The economic institutions of
capitalism: firms, markets, relational contracting. New York,
The Free Press.
1985
31394
J
KIRKPATRICK, S., GELATT, C.D. & VECCHI, M. (1983).
Optimization by Simulated Annealing. Science, 220(4598),
671-680. DOI: 10.1126/science.220.4598.671
1983
31026
B
NORTH, D. C. (1990). Institutions, institutional change, and
economic performance. New York, Cambridge University
1990
31019
Does Google Scholar contain all highly cited documents (1950-2013)?
50
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
Press.
B
BOURDIEU, P. (1979). Distinction: a social critique of the
judgement of taste. London, Routledge & Kegan Paul.
1979
30870
B
PORTER, M. E. (1985). Competitive advantage: creating and
sustaining superior performance. New York, The Free Press.
1985
30532
J
MOSMANN, T. (1983). Rapid colorimetric assay for cellular
growth and survival: application to proliferation and
cytotoxicity assays. Journal of immunological methods,
65(1-2), 55-63. DOI:10.1016/0022-1759(83)90303-4
1983
30514
B
PATTON, M. Q. (1980). Qualitative evaluation methods. Beverly
Hills, Calif, Sage.
1980
30258
J
THOMPSON, J.D. et al. (1997). The CLUSTAL X windows
interface: Flexible strategies for multiple sequence
alignment aided by quality analysis tools.Nucleic Acids
Research, 25(24), 4876-4882. DOI: 10.1093/nar/25.24.4876
1997
30123
B
MASLOW, A. H. (1954). Motivation and personality. New York,
Harper & Row.
1954
30095
J
DUBOIS, M. et al.(1956). Colorimetric method for determination
of sugars and related substances. Analytical Chemistry,
28(3), 350-356. DOI: 10.1021/ac60111a017
1956
30045
B
PORTER, M. E. (1985). Competitive advantage: Creating and
sustaining superior performance. New York , Free Press
1985
29924
B
LAZARUS, R. S., & FOLKMAN, S. (1984). Stress, appraisal, and
coping. New York, Springer Publishing Company.
1984
29844
J
SHANNON, R.D. (1976). Revised effective ionic radii and
systematic studies of interatomic distances in halides and
chalcogenides. Acta Crystallographica Section A, 32(5), 751-
767. DOI: 10.1107/S0567739476001551
1976
29796
J
BECKE, A.D. (1988). Density-functional exchange-energy
approximation with correct asymptotic behavior. Physical
Review A, 38(6), 3098-3100. DOI:
10.1103/PhysRevA.38.3098
1988
29764
B
COHEN, J., & COHEN, (1975). Applied multiple
regression/correlation analysis for the behavioral sciences.
Hillsdale, N.J., Lawrence Erlbaum Associates.
1975
29609
B
KITTEL, C. (1953). Introduction to solid state physics. New York,
John Wiley & Sons, Inc.
1953
29486
B
CARSLAW, H. S., & JAEGER, J. C. (1947). Conduction of heat in
solids. Oxford, Clarendon Press.
1947
29426
B
KNUTH, D. E. (1968). The art of computer programming.
Reading Mass, Addison-Wesley.
1968
29396
B
MANDELBROT, B. B. (1977). The fractal geometry of nature.
New York, W.H. Freeman.
1977
29270
B
LAKOFF, G., & JOHNSON, M. (1980). Metaphors we live by.
Chicago, University of Chicago Press.
1980
29211
J
TVERSKY, A. & KAHNEMAN, D. (1974). Judgment under
Uncertainty: Heuristics and Biases. Science (New York,
N.Y.), 185(4157), 1124-1131.
DOI:10.1126/science.185.4157.1124
1974
29152
J
BLAND, J.M. & ALTMAN, D.G. (1986). Statistical methods for
assessing agreement between two methods of clinical
measurement. Lancet, 1(8476), 307-310.
1986
28934
EC3 Working Papers Nº 19
51
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
DOI:10.1016/S0140-6736(86)90837-8
B
BOWLBY, J. (1969). Attachment and loss. London, Hogarth.
1969
28893
B
PUTNAM, R. D. (2000). Bowling alone: the collapse and revival
of American community. New York, N.Y. [etc.], Simon &
Schuster.
2000
28850
B
GILLIGAN, C. (1982). In a different voice: psychological theory
and women's development. Cambridge, Mass, Havard
University Press.
1982
28817
J
KAHNEMAN, D. & TVERSKY, A. (1979). Prospect Theory: An
Analysis of Decision under Risk. Econometrica, 47(2), 263-
291. DOI: 10.2307/1914185
1979
28812
B
WENGER, E. (1998). Communities of practice: learning,
meaning, and identity. Cambridge, Cambridge university
press.
1998
28770
B
WILLIAMSON, O. E. (1975). Markets and hierarchies: analysis
and antitrust implications : a study in the economics of
internal organization. New York, The Free Press.
1975
28708
J
RAJAGOPAL, A.K. & CALLAWAY, J. (1973). Inhomogeneous
electron gas. Physical Review B, 7(5), 1912-1919.
DOI:10.1103/PhysRev.136.B864
1973
28591
B
NONAKA, I., & TAKEUCHI, H. (1995). The knowledge-creating
company: how Japanese companies create the dynamics of
innovation. New York, Oxford University Press.
1995
28486
C
KENNEDY, J. & EBERHART, R. (1995). Particle swarm
optimization. In Neural Networks, 1995. Proceedings., IEEE
International Conference on. IEEE, p 1942-1948 vol.4.
1995
28409
B
STEEL, R. G. D., & TORRIE, J. H. (1960). Principles and procedures
of statistics: A biometrical approach. New York, McGraw-
Hill.
1960
28346
B
HART, E. (1973). Pattern classification and scene analysis:
Richard O. Duda,... Peter E. Hart. New York ; London ;
Sydney [etc.], J. Wiley & Sons
1973
28093
B
PUTNAM, R. D., LEONARDI, R., & NANETTI, R. Y. (1993). Making
democracy work: civic traditions in modern Italy. Princeton,
Princeton university press.
1993
27978
B
ERIKSON, E. H. (1950). Childhood and society. New York,
Norton.
1950
27930
B
GIDDENS, A. (1984). The constitution of society: outline of the
theory of structuration. Berkeley, University of California
Press.
1984
27842
B
BUTLER, J. (1990). Gender trouble: feminism and the
subversion of identity . London, Routledge.
1990
27818
J
GRANOVETTER, M.S. (1973). The Strength of Weak Ties.
American Journal of Sociology, 78(6), 1360.
DOI:10.1086/225469
1973
27812
B
SAID, E. W. (1978). Orientalism. London, Penguin.
1978
27806
J
METROPOLIS, N. et al. (1953). Equation of State Calculations by
Fast Computing Machines. The Journal of Chemical Physics,
21(6), 1087-1092. DOI:10.1063/1.1699114
1953
27069
B
OLSON, M. (1965). The logic of collective action: public goods
and the theory of groups. Harvard, Harvard University
1965
27058
Does Google Scholar contain all highly cited documents (1950-2013)?
52
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
Press.
B
BANDURA, A. (1971). Social learning theory. Morristown, N.J.,
General Learning Press.
1971
27016
B
FOUCAULT, M. (1985). The use of pleasure : volume 2 of the
history of sexuality. Harmondsworth, Middlesex, England,
Viking.
1985
26955
J
RADLOFF, L.S. (1977). The CES-D Scale: A Self Report Depression
Scale for Research in the General. Applied Psychological
Measurement, 1, 385-401. DOI:
10.1177/014662167700100306
1977
26787
B
CRESWELL, J. W. (1997). Qualitative inquiry and research design:
choosing among five traditions. London, SAGE.
1997
26706
B
ALLEN, M. , & TILDESLEY, D. J. (1987). Computer simulation of
liquids. Oxford, Clarendon press
1987
26703
B
CRANK, J. (1956). The mathematics of diffusion. Clarendon
Press, Oxford.
1956
26633
B
SCHUMPETER, J. A., & SWEDBERG, R. (1942). Capitalism,
socialism and democracy. London, Routledge.
1942
26603
J
FEYERABEND, (1955). Wittgenstein's Philosophical
Investigations. The Philosophical Review. 64, 449-483.
1955
26576
O
FRISCH, M. et al. (2004). Gaussian 03, revision c. 02;
Gaussian. Inc., Wallingford, CT, 4.
2004
26531
B
BARD, A. J., & FAULKNER, L. R. (1980). Electrochemical methods:
fundamentals and applications. New York, N.Y., J. Wiley and
Sons.
1980
26494
B
JAMES, W. (1890). The principles of psychology. New York, H.
Holt and Company.
1890
26472
B
PATTON, M. Q. (1980). Qualitative research & evaluation
methods. Thousand Oaks, Calif, Sage Publications.
1980
26382
B
MARX, K. (1886). Capital: a critical analysis of capitalist
production. London, William Glaisher.
1886
26242
B
NELSON, R. R., & WINTER, S. G. (1982). An evolutionary theory
of economic change. Cambridge, Mass, Belknap Press of
Harvard University Press.
1982
26145
J
AJZEN, I. (1991). The theory of planned behavior. Organizational
Behavior and Human Decision Processes, 50(2), 179-211.
DOI:10.1016/0749-5978(91)90020-T
1991
26144
B
DRAPER, N. R., & SMITH, H. (1966). Applied regression analysis.
New York, John Wiley and Sons, Inc.
1966
26109
B
DARWIN, C. (1859). The origin of species by means of natural
selection, or, The preservation of favoured races in the
struggle for life. London, John Murray, Albemarle Street.
1859
25970
B
AUSTIN, J. L. (1962). How to do things with words. Cambridge,
Harvard University Press.
1962
25949
B
EFRON, B., & TIBSHIRANI, R. J. (1993). An introduction to the
bootstra London, Angleterre, Chapman and Hall.
1993
25940
J
OTWINOWSKI, Z. & MINOR, W. (1997). Processing of X-ray
diffraction data collected in oscillation mode. Methods in
Enzymology, 276, 307-326. DOI:10.1016/S0076-
6879(97)76066-X
1997
25800
B
AMERICAN PUBLIC HEALTH ASSOCIATION. (1900). Standard
methods for the examination of water and wastewater.
1900
25714
EC3 Working Papers Nº 19
53
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
Washington, APHA-AWWA-WPCF.
B
 
1977
25644
B
BOURDIEU, P. (1977). Outline of a theory of practice.
Cambridge, U.K., Cambridge University Press.
1977
25613
B
PAULING, L. (1939). The nature of the chemical bond and the
structure of molecules and crystals; an introduction to
modern structural chemistry. Ithaca, N.Y., Cornell
University Press.
1939
25506
J
DIMAGGIO, J. & POWELL, W.W. (1983). The Iron Cage Revisited:
Institutional Isomorphism and Collective Rationality in
Organizational Fields. American Sociological Review, 48(2),
147. DOI: 10.2307/2095101
1983
25488
J
EISENHARDT, K.M. (1989). Building Theories from Case Study
Research. Academy of Management Review, 14(4), 532-
550. DOI:10.2307/258557
1989
25411
B
FESTINGER, L. (1957). A theory of cognitive dissonance.
Stanford, Calif, Stanford university press.
1957
25299
J
FELSENSTEIN, J. (1985). Confidence limits on phylogenies: an
approach using the bootstra Evolution, 783791. DOI:
10.2307/2408678
1985
25221
J
BECK, A.T. et al. (1961) . An inventory for measuring
depression. Archives of general psychiatry, 4, 561-571.
1961
25085
B
VYGOTSKY, L. S. (1962). Thought and language. Cambridge, The
Massachusetts Institute of Technology.
1962
24996
J
COLEMAN, J.S. (1988). Social Capital in the Creation of Human
Capital. American Journal of Sociology, 94(s1), S95.
DOI:10.1086/228943
1988
24994
J
LANDIS, J.R. & KOCH, G.G. (1977). The measurement of observer
agreement for categorical data. Biometrics, 33(1), 159-174.
DOI:10.2307/2529310
1977
24981
B
KOLB, D. A. (1984). Experiential learning: experience as the
source of learning and development. Englewood Cliffs, N.J.,
Prentice-Hall.
1984
24860
B
MCCULLAGH, , & NELDER, J. (1983). Generalized Linear Models.
London, Chapman and Hall.
1983
24694
B
MILLER, J. H. (1972). Experiments in molecular genetics. New
York, Cold Spring Harbor Laboratory.
1972
24682
B
PATANKAR, S. V. (1980). Numerical heat transfer and fluid flow.
Washington, D.C., New York, Hemisphere. Taylor and
Francis.
1980
24663
J
BLACK, F. & SCHOLES, M. (1973). The Pricing of Options and
Corporate Liabilities. Journal of Political Economy, 81(3),
637. DOI:10.1086/260062
1973
24582
J
GRANOVETTER, M. (1985). Economic-action and social-structure
- the problem of embeddedness. American Journal of
Sociology, 91(3), 481-510. DOI:10.1086/228311
1985
24324
B
BREIMAN, L. et al.(1984). Classification and regression trees.
Pacific Grove, Calif, Wadsworth & Brooks-Cole advanced
books & software.
1984
24299
J
LOWE, D.G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Computer
2004
24234
Does Google Scholar contain all highly cited documents (1950-2013)?
54
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
Vision, 60(2), 91-110. DOI:
10.1023/B:VISI.0000029664.99615.94
B
STRAUSS, A. L., & CORBIN, J. (1998). Basics of qualitative
research: techniques and procedures for developing
grounded theory. Thousand Oaks , SAGE Publications.
1998
24209
B
PAPOULIS, A. (1965). Probability, Random variables, and
stochastic processes. New York: McGraw-Hill.
1965
24099
J
AKAIKE, H. (1974). A new look at the statistical model
identification. IEEE Transactions on Automatic Control,
19(6). DOI:10.1109/TAC.1974.1100705
1974
24061
B
QUINLAN, J. R. (1993). C4.5: Programs for machine learning. San
Mateo, CA: Morgan Kaufmann.
1993
24050
B
LAKOWICZ, J. R. (1983). Principles of fluorescence spectroscopy.
New York: Plenum Press.
1983
23977
B
SWOFFORD, D. (1995). PAUP 4.0 phylogenetic analysis using
parsimony. [Sunderland, Mass.], Sinauer Associates.
1995
23957
B
HORN, R. A., & JOHNSON, C. R. (1985). Matrix analysis - Roger A.
Horn, Charles R. Johnson. Cambridge, Cambridge university
press.
1985
23908
B
BECKER, G. S. (1964). Human capital: a theoretical and empirical
analysis, with special reference to education. New York,
National Bureau of Economic Research.
1964
23879
B
MEAD, G. H., & MORRIS, C. W. (1934). Mind, self, and society:
from the standpoint of a social behaviorist. Chicago, The
University of Chicago Press.
1934
23824
J
HARDIN, G.(1968). The Tragedy of the Commons. Science,
162(3859), 1243-1248. DOI:
1968
23737
B
     Organisational learning: a
theory of action perspective. Reading, Mass. [etc.],
Addison-Wesley Publishing company.
1978
23329
B
CHOMSKY, N. (1965). Aspects fo the theory of syntax.
Cambridge, Mass, MIT Press.
1965
23178
J
COASE, R.H. (1960). The Problem of Social Cost. The Journal of
Law and Economics, 3(1), 1. DOI:10.1086/466560
1960
23141
B
BURNHAM, K. , ANDERSON, D. R., & BURNHAM, K.
(2002). Model selection and multimodel inference: a
practical information-theoretic approach. New York,
Springer
2002
23046
B
BECK, U., & RITTER, M. (1992). Risk society: towards a new
modernity. London, Sage Publications.
1992
22990
B
GIDDENS, A. (1991). Modernity and self-identity: self and
society in the late modern age. Cambridge, U.K., Polity
Press in association with Basil Blackwell.
1991
22977
J
ENGLE, R.F. & GRANGER, C.W.J. (1987). Co-integration and Error
Correction: Representation, Estimation, and Testing.
Econometrica, 55(2), 251-76. DOI: 10.2307/1913236
1987
22851
B
ZHU, F., WU, R., HU, Y., & JIANG, Z. (1995). Zhu Futang shi yong
er ke xue. Chinamaxx Digital Library. Beijing Shi, Ren min
wei sheng chu ban she.
1995
22810
J
WATTS, D. & STROGATZ, S. (1998). Collective dynamics of
-   -442. DOI:
10.1038/30918
1998
22681
EC3 Working Papers Nº 19
55
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
B
FLORY, J. (1953). Principles of polymer chemistry. Ithaca, N.Y.,
Cornell Univ. Pr.
1953
22680
J
TAMURA, K. et al. (2007). MEGA4: Molecular Evolutionary
Genetics Analysis (MEGA) software version 4.0. Molecular
Biology and Evolution, 24(8), 1596-1599. DOI:
10.1093/molbev/msm092
2007
22680
B
FALCONER, D. S. (1960). Introduction to quantitative genetics.
Edinburgh, Oliver and Boyd.
1960
22654
B
GRICE, H. (1970). Logic and conversation. Cambridge, Mass,
Harvard Univ.
1970
22608
B
RUSSELL, S. J., NORVIG, , & CANNY, J. F. (1995). Artificial
intelligence: a modern approach. Englewood Cliffs,
Prentice-Hall International.
1995
22577
J
CRONBACH, L.J. (1951). Coefficient alpha and the internal
structure of tests.Psychometrika, 16(3), 297-334. DOI:
1951
22531
B
AJZEN, I., & FISHBEIN, M. (1980). Understanding attitudes and
predicting social behavior. Englewood Cliffs, N.J., Prentice-
Hall.
1980
22419
B
BHABHA, H. K. (1994). The Location of culture. London,
Routledge.
1994
22414
B
COHEN, W.M. & LEVINTHAL, D.A. (1990). Absorptive Capacity: A
New Perspective on Learning and Innovation W. H.
Starbuck & S. Whalen, eds. Administrative Science
Quarterly, 35(1), 128-152.
1990
22301
B
CASTELLS, M. (1996). The rise of the network society. Oxford,
Blackwell Publishers.
1996
22207
B
DAUBECHIES, I. (1992). Ten lectures on wavelets. Philadelphia,
Pa, Soc. for Industrial and Applied Mathematics.
1992
22165
B
AXELROD, R. (1984). The evolution of cooperation. New York,
Basic Books.
1984
22109
B
AIKEN, L. S., WEST, S. G., & RENO, R. R. (1991). Multiple
regression: testing and interpreting interactions. Newbury
Park, CA, Sage Publications.
1991
22036
B
NAKAMOTO, K. (1970). Infrared and Raman spectra of inorganic
and coordination compounds. New York, Wiley.
1970
22022
J
BENJAMINI, Y. & HOCHBERG, Y. (1995). Controlling the False
Discovery Rate: A Practical and Powerful Approach to
Multiple Testing. Journal of the Royal Statistical Society.
Series B (Methodological), 57(1), 289 - 300. DOI:
1995
21933
J
PRAHALAD, C.K. & HAMEL, G. (1990). The core competencies of
the corporation. Harvard Business Review, 68(3), 79-91.
1990
21868
B
WITTEN, I. H., & FRANK, E. (1999). Data mining: practical
machine learning tools and techniques with Java
implementations. San Francisco, Calif, Morgan Kaufmann.
1999
21828
B
COLEMAN, J. S. (1990). Foundations of social theory. Cambridge,
Mass, Harvard University Press.
1990
21791
J
ROSS, R. (1999). Atherosclerosis--an inflammatory disease. The
New England journal of medicine, 340(2), 115-126.
DOI:10.1016/S0002-8703(99)70266-8
1999
21741
B
KEYNES, J. M. (1936). The General theory of employment
interest and money. London, Macmillan and Co.
1936
21690
Does Google Scholar contain all highly cited documents (1950-2013)?
56
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
B
BIRD, R. B., STEWART, W. E., & LIGHTFOOT, E. N.
(1960). Transport phenomena. New York, J.Wiley.
1960
21628
B
ISRAELACHVILI, J. N. (1985). Intermolecular and surface forces:
with applications to colloidal and biological systems.
London, Academic Press.
1985
21522
B
BISHOP, C. M. (1995). Neural networks for pattern recognition.
Oxford, UK, Oxford University Press.
1995
21458
B
COTTON, F. A. :. W., G. (1962). Advanced Inorganic Chemistry.
London, Wiley.
1962
21450
J
KRESSE, G., & FURTHMÜLLER, J. (1996). Efficient iterative
schemes for ab initio total-energy calculations using a
plane-wave basis set. Physical Review B, 54(16), 11169-
11186. DOI: 10.1103/PhysRevB.54.11169
1996
21248
J
MARQUARDT, D.W. (1963). An Algorithm for Least-Squares
Estimation of Nonlinear Parameters. Journal of the Society
for Industrial and Applied Mathematics, 11(2), 431-441.
DOI:
1963
21246
B
HOFSTEDE, G. (1991). Cultures and organizations: sofware of
the mind. London, McGraw-Hill Book Company.
1991
21232
J
        
disease: report of the NINCDS-ADRDA Work Group under
the auspices of Department of Health and Human Services
Task Force on Alzheimer's Disease. Neurology, 34(7), 939-
944.
1984
21172
B
BRYK, A. S., & RAUDENBUSH, S. W. (1992). Hierarchical linear
models: Applications and data analysis methods. London,
Sage Publications.
1992
21134
B
DOWNS, A. (1957). An economic theory of democracy. New
York, Harper.
1957
21017
J
WARE, J.E. & SHERBOURNE, C.D. (1992). The MOS 36-item
short-form health survey (SF-36). I. Conceptual framework
and item selection. Medical care, 30(6), 473-483. DOI:
10.1097/00005650-199206000-00002
1992
20975
B
HAN, J., & KAMBER, M. (2000). Data mining: concepts and
techniques. London, Harcourt Publishers, a subsidiary of
Harcourt International Ltd.
2000
20974
B
MONTGOMERY, D. C. (1976). Design and analysis of
experiments. New York, John Wiley & Sons.
1976
20905
J
REYNOLDS, E. S. (1963). The use of lead citrate at high pH as an
electron-opaque stain in electron microscopy. The Journal
of cell biology, 17(1), 208-212. DOI: 10.1083/jcb.17.1.208
1963
20890
B
BRONFENBRENNER, U. (1979). The ecology of human
development: experiments by nature and design.
Cambridge - Mass. & London, Harvard University Press.
1979
20865
J
SCHWARZ, G. (1978). Estimating the Dimension of a Model. The
Annals of Statistics, 6(2), 461-464. DOI:
10.1214/aos/1176344136
1978
20828
B
GOODMAN, L. S., & GILMAN, A. (1941). The pharmacological
basis of therapeutics: a textbook of pharmacology,
toxicology and therapeutics for physicians and medical
students. New York, Macmillan.
1941
20827
B
NIELSEN, M. A., & CHUANG, I. (2000). Quantum computation
2000
20803
EC3 Working Papers Nº 19
57
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
and quantum information. Cambridge, U.K., Cambridge
University Press.
B
        
Editions L'Harmattan.
1994
20745
B
ZIENKIEWICZ, O. C. (1971). The finite element method in
engineering science. London, McGraw-Hill.
1971
20733
J
FEINBERG, A. & VOGELSTEIN, B. (1983). A technique for
radiolabeling DNA restriction endonuclease fragments to
high specific activity. Analytical biochemistry, 132(1), 6-13.
DOI: 10.1016/0003-2697(83)90418-9
1983
20710
B
BAHTIN, M. M., & HOLQUIST, M. (1981). The dialogic
imagination: four essays. Austin, University of Texas Press.
1981
20681
B
FEISENSTEIN, J. (1989). PHYLIP: Phylogeny Inference Package
version 3.2 manual. [Seattle], University of Washington.
1989
20666
J
HU, L. & BENTLER, M. (1999). Cutoff criteria for fit indexes in
covariance structure analysis: Conventional criteria versus
new alternatives. Structural Equation Modeling: A
Multidisciplinary Journal, 6(1), 1-55. DOI:
10.1080/10705519909540118
1999
20619
B
VON NEUMANN, J., & MORGENSTERN, O. (1944). Theory of
games and economic behavior. Princeton, Princeton Univ.
Press.
1944
20582
J
FRIEDEWALD, W.T., LEVY, R.I. & FREDRICKSON, D.S. (1972).
Estimation of the concentration of low-density lipoprotein
cholesterol in plasma, without use of the preparative
ultracentrifuge. Clinical Chemistry, 18(6), 499-502.
1972
20476
B
MERTON, R. K. (1949). Social theory and social structure. N.Y.,
Free Press.
1949
20470
J
WEBER, K. & OSBORN, M. (1969). The Reliability of Molecular
Weight Determinations by Dodecyl Sulfate-Polyacrylamide
Gel Electrophoresis. Journal of Biological Chemistry,
244(16), 4406-4412. DOI:
1969
20435
J
SCHEIN, E.H. (1984). Coming to a New Awareness of
Organizational Culture. Sloan Management Review, 25(2),
3.
1984
20341
B
STEEL, R. G. D., & TORRIE, J. H. (1960). Principles and procedures
of statistics: A biometrical approach. New York, McGraw-
Hill.
1960
20299
B
BOYD, S. , & VANDENBERGHE, L. (2004). Convex optimization.
Cambridge, Cambridge University Press.
2004
20261
J
CHIRGWIN, J.M. et al. (1979). Isolation of biologically active
ribonucleic acid from sources enriched in ribonuclease.
Biochemistry, 18(24), 5294-5299. DOI:
10.1021/bi00591a005
1979
20253
J
DAVIS, B.J. (1964).. DISC ELECTROPHORESIS - II METHOD AND
APPLICATION TO HUMAN SERUM PROTEINS. Annals of the
New York Academy of Sciences, 121(2), 404-427.
DOI:10.1111/j.1749-6632.1964.tb14213.x
1964
20239
J
LUCAS, R.E. (1988). On the mechanics of economic
development. Journal of Monetary Economics, 22(1), 3-42.
DOI:10.1016/0304-3932(88)90168-7
1988
20233
B
KENDALL, M. G. (1943). The Advanced theory of statistics. Vol. II
1943
20197
Does Google Scholar contain all highly cited documents (1950-2013)?
58
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
& Vol. III. London, Charles Griffin.
B
FLEISS, J. L. (1973). Statistical methods for rates and
proportions. New York, John Wiley & Sons.
1973
20196
B
ROSENBERG, M. (1965). Society and the adolescent self-image.
Princeton, No.J., Princeton University Press.
1965
20191
J
FORNELL, C. & LARCKER, D.F. (1981). Evaluating Structural
Equation Models with Unobservable Variables and
Measurement Error. Journal of Marketing Research (JMR).
Feb1981, 18(1), 39-50. 12 1 Diagram. DOI:10.2307/3151312
1981
20042
B
FISHBEIN, M., & AJZEN, E. (1975). Belief, attitude, intention and
behavior: an introduction to theory and research. Reading,
Mass. ; Don Mills, Ont, Addison-Wesley.
1975
20030
B
HEBB, D. O. (1949). The organization of behavior a
neuropsychological approach. New York, NY, John Wiley &
Sons.
1949
19954
B
GARDNER, H. (1983). Frames of mind: the theory of multiple
intelligences. New York, Basic books.
1983
19928
J
MONKHORST, H. J., & PACK, J. D. (1976). Special points for
Brillouin-zone integrations. Physical Review B. 13, 5188-
5192. DOI: 10.1103/PhysRevB.13.5188
1976
19904
J
CULLITY, B.D. (1957). Elements of X-Ray Diffraction. American
Journal of Physics, 25(6), 394. DOI:
1957
19875
J
GRYNKIEWICZ, G., POENIE, M. & TSIEN, R.Y. (1985). A new
generation of Ca2+ indicators with greatly impoved
fluorescence properties. Journal of Biological Chemistry,
260(6), 3440-3450.
1985
19871
J
BARABISI, A.L., & ALBERT, R. (1999). Emergence of Scaling in
Random Networks.Science. 286, 509. DOI:
10.1126/science.286.5439.509
1999
19806
B
KEETON, W. , & PROSSER, W. L. (1941). Prosser and Keeton on
the law of torts. St. Paul, Minn, West publishing.
1941
19725
B
MITRA, G. (1988). Mathematical models for decision support:
Advanced study institute : Papers.
1988
19713
B
WOOLDRIDGE, J. M. (2002). Econometric analysis of cross
section and panel data. Cambridge, Mass, MIT Press.
2002
19698
J
HAMILTON, M. (1960). A rating scale for depression. Journal of
neurology, neurosurgery, and psychiatry, 23, 56-62. DOI:
10.1136/jnn23.1.56
1960
19690
B
PEARL, J. (1988). Probabilistic reasoning in intelligent systems:
networks of plausible inference. San Mateo, Calif, Morgan
Kaufmann Publishers.
1988
19578
B
GIBSON, J. J. (1979). The ecological approach to visual
perception. Boston, Houghton Mifflin company.
1979
19555
B
DEWEY, J. (1916). Democracy and Education : An introduction to
the philosophy of education. New York, The Macmillan
Company.
1916
19420
J
DAVIS, F.D. (1989). Perceived usefulness, perceived ease of use,
and user acceptance of information technology. MIS
Quarterly, 13(3), 319-340. DOI: 10.2307/249008
1989
19355
B
POPPER, K. R. (1959). The logic of scientific discovery. London,
Hutchinson.
1959
19325
B
ESPING-ANDERSEN, G. (1990). The three worlds of welfare
1990
19311
EC3 Working Papers Nº 19
59
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
capitalism. Cambridge, Polity press.
B
HASTIE, T., HASTIE, T., TIBSHIRANI, R., & FRIEDMAN, J. H.
(2001). The elements of statistical learning: data mining,
inference, and prediction. New York, Springer.
2001
19242
B
PENROSE, E. T. (1959). The theory of growth of the firm. Oxford,
Basil Blackwell.
1959
19180
B
RAPPAPORT, T. S. (1996). Wireless communications: principles
and practice. Upper Saddle River (New Jersey), Prentice Hall
PTR.
1996
19175
B
BLOOM, B. S. (1956). Taxonomy of educational objectives : the
classification of educational goals Handbook 1 Handbook 1.
New York, McKay.
1956
19139
B
JOLLIFFE, I. T. (1986). Principal component analyses. New York,
Springer-Verlag.
1986
19120
B
WEINER, I. B., & CRAIGHEAD, W. E. (1984). The Corsini
encyclopedia of psychology. Hoboken, NJ, Wiley.
1984
19119
J
COHEN, J. (1960). A coefficient of agreement of nominal scales.
Educational and Psychological Measurement, 20(1), 37-46.
DOI: 10.1177/001316446002000104
1960
19116
B
LEZAK, M. D. (1976). Neuropsychological assessment. New York,
Oxford University Press.
1976
19081
B
MILLER, G. A. (1956). The magical number seven, plus or minus
two: some limits on our capacity for processing
information. Indiana, Bobbs-Merrill. DOI: 10.1037//0033-
295X.101.2.343
1956
19070
B
HARVEY, D. (1989). The condition of postmodernity: an enquiry
into the origins of cultural change. Oxford ; Cambridge,
Mass, Blackwell.
1989
19053
J
ROGERS, S. (1996). Adaptive filter theory. Control Engineering
Practice, 4(11), 1629-1630.
1996
19012
J
CANNY, J. (1986). A computational approach to edge detection.
IEEE transactions on pattern analysis and machine
intelligence, 8(6), 679-698.
1986
18958
J
RABINER, L.R. (1989). A tutorial on hidden Markov models and
selected applications in speech recognition. Proceedings of
the IEEE, 77(2), 257-286. DOI: 10.1109/5.18626
1989
18920
B
BOLLEN, K.A. (1998). Structural Equation Models. Encyclopedia
of Biostatistics. 7.
1998
18905
J
WHITE, H. (1980). A heteroskedasticity-consistent covariance
matrix estimator and a direct test for
heteroskedasticity. Econometrica, 48(4), 817-838.
DOI:10.2307/1912934
1980
18878
B
MARCH, J. G., & SIMON, H. A. (1958). Organizations. New York,
Wiley.
1958
18835
J
     -dimensional
electrophoresis of proteins. The Journal of biological
chemistry, 250(10), 4007-4021.
1975
18835
B
POLANYI, M. (1966). The tacit dimension. Garden City, N. Y,
Doubleday.
1966
18818
J
HANAHAN, D., & WEINBERG, R. A. (2000). The Hallmarks of
Cancer. Cell. 100, 57. DOI:10.1016/S0092-8674(00)81683-9
2000
18740
Does Google Scholar contain all highly cited documents (1950-2013)?
60
Document
type
Bibliographic reference
1st ed.
Pub.
Year
GS
Citations
J
NELDER, J.A. & MEAD, R. (1965). A Simplex Method for Function
Minimization. The Computer Journal, 7(4), 308-313. DOI:
1965
18732
B
SEN, A. (1999). Development as freedom. Oxford, Oxford
University Press.
1999
18728
B
GOFFMAN, E. (1963). Stigma; notes on the management of
spoiled identity. Englewood Cliffs, N.J., Prentice-Hall.
1963
18727
J
AKERLOF, G. A. (1970). The Market for  
Uncertainty and the Market Mechanism. The Quarterly
Journal of Economics, 84(3), 488-500.
DOI:10.2307/1879431
1970
18684
J
DUNCAN, D. (1955). Multiple range and multiple F
tests. Biometrics, 11(1), 1-42. DOI:10.2307/3001478
1955
18682
B
LYOTARD, J.F., BENNINGTON, G., MASSUMI, B., & JAMESON, F.
(1984). The postmodern condition: a report on knowledge.
Manchester, Manchester University Press.
1984
18672
B
FOUCAULT, M. (1972). The archaeology of knowledge: and the
discourse on language. New York, Pantheon Books.
1972
18668
J
MALLAT, S.G. (1989). Theory for multiresolution signal
decomposition: the wavelet representation. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
11(7), 674-693. DOI: 10.1109/34.192463
1989
18662
B
FOUCAULT, M., & GORDON, C. (1980). Power/knowledge:
selected interviews and other writings, 1972-1977.
Brighton, Sussex, Harvester Press.
1980
18643
B
KLINE,