-
[show abstract]
[hide abstract]
ABSTRACT: With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data used in the creation of citations, the policies and procedures that we follow to avoid double-counting and to eliminate contributions which may not be scholarly in nature. Finally, we describe how users and institutions can easily obtain quantitative citation data from the ADS, both interactively and via web-based programming tools. The ADS is available at http://ads.harvard.edu.
11/2006;
-
[show abstract]
[hide abstract]
ABSTRACT: We discuss two techniques used to characterize bibliographic records based on their similarity to and relationship with the contents of the NASA Astrophysics Data System (ADS) databases. The first method has been used to classify input text as being relevant to one or more subject areas based on an analysis of the frequency distribution of its individual words. The second method has been used to classify existing records as being relevant to one or more databases based on the distribution of the papers citing them. Both techniques have proven to be valuable tools in assigning new and existing bibliographic records to different disciplines within the ADS databases.
12/2005;
-
[show abstract]
[hide abstract]
ABSTRACT: It has been shown (Lawrence, S. (2001). Online or invisible? Nature, 411, 521) that journal articles which have been posted without charge on the internet are more heavily cited than those which have not been. Using data from the NASA Astrophysics Data System (ads.harvard.edu) and from the ArXiv e-print archive at Cornell University (arXiv.org) we examine the causes of this effect.
Information Processing & Management. 04/2005;
-
[show abstract]
[hide abstract]
ABSTRACT: Digital libraries such as the NASA Astrophysics Data System (Kurtz et al., 2005) permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses (“reads”) of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences among the citation rate, the readership rate, and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read–cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read–cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites and develop two new bibliometric measures which have substantially less age bias than citations: SumProd, a weighted sum of total citations and the readership rate, intended to show the total productivity of an individual; and Read10, the readership rate for articles published in the last 10 years, intended to show an individual's current productivity. We also discuss the effect of normalization (dividing by the number of authors on a paper) on these statistics. We apply SumProd and Read10 using new, nonparametric techniques to compare the quality of different astronomical research organizations.
Journal of the American Society for Information Science and Technology 01/2005; 56(2):111 - 128. · 2.08 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed online digital library which has become the dominant means by which astronomers search, access, and read their technical literature. Digital libraries permit the easy accumulation of a new type of bibliometric measure: the number of electronic accesses (“reads”) of individual articles. By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create second-order bibliometric operators, a customizable class of collaborative filters that permits substantially improved accuracy in literature queries. Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP), we have developed an accurate model for worldwide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP. We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 full-time researchers, or $250 million, or the astronomical research done in France.
Journal of the American Society for Information Science and Technology 12/2004; 56(1):36 - 45. · 2.08 Impact Factor
-
CoRR. 01/2004; cs.DL/0401028.
-
[show abstract]
[hide abstract]
ABSTRACT: We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned
article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references
and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against
the database provides unsatisfactory results. Instead, we propose a method that allows a controlled merging of correction,
parsing and matching, inspired by dependency grammars. We also report on the effectiveness of various heuristics that we have
employed to improve recall.
12/2003: pages 521-530;