Article

The scholarly communication of economic knowledge: a citation analysis of Google Scholar

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Citation counts can be used as a proxy to study the scholarly communication of knowledge and the impact of research in academia. Previous research has addressed several important factors of citation counts. In this study, we aim to investigate whether there exist quantitative patterns behind citations, and thus provide a detailed analysis of the factors behind successful research. The study involves conducting quantitative analyses on how various features, such as the author’s quality, the journal’s impact factor, and the publishing year, of a published scientific article affect the number of citations. We carried out full-text searches in Google Scholar to obtain our data set on citation counts. The data set is then set up into panels and used to conduct the proposed analyses by employing a negative binomial regression. Our results show that attributes such as the author’s quality and the journal’s impact factor do have important contributions to its citations. In addition, an article’s citation count does not only depend on its own properties as mentioned above but also depends on the quality, as measured by the number of citations, of its cited articles. That is, the number of citations of a paper seems to be affected by the number of citations of articles that the particular paper cites. This study provides statistical characteristics of how different features of an article affect the number of citations. In addition, it provides statistical evidence that the number of citations of a scientific article depends on the number of citations of the articles it cites.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This phenomenon is known as overdispersion and it violates equidispersion (i.e., mean and variance coincide) as an important assumption for valid statistical inference based on the Poisson model (Hilbe 2011). This issue has been acknowledged in numerous scientometric studies (Didegah and Thelwall 2013;Ketzler and Zimmermann 2013;Sun and Xia 2016) with some of them dating back to the early 1980s (Cohen 1981). Underdispersion, however, has been less an issue in the scientometric literature. ...
Article
Full-text available
Item-response models from the psychometric literature have been proposed for the estimation of researcher capacity. Canonical items that can be incorporated in such models to reflect researcher performance are count data (e.g., number of publications, number of citations). Count data can be modeled by Rasch's Poisson counts model that assumes equidispersion (i.e., mean and variance must coincide). However, the mean can be larger as compared to the variance (i.e., underdispersion), or b) smaller as compared to the variance (i.e., overdispersion). Ignoring the presence of overdispersion (underdispersion) can cause standard errors to be liberal (conservative), when the Poisson model is used. Indeed, number of publications or number of citations are known to display overdispersion. Underdispersion, however, is far less acknowledged in the literature. In the current investigation the flexible Conway-Maxwell-Poisson count model is used to examine reliability estimates of capacity in relation to various dispersion patterns. It is shown, that reliability of capacity estimates of inventors drops from .84 (Poisson) to .68 (Conway-Maxwell-Poisson) or .69 (negative binomial). Moreover, with some items displaying overdispersion and some items displaying underdispersion, the dispersion pattern in a reanalysis of Mutz and Daniel's (2018) researcher data was found to be more complex as compared to previous results. To conclude, a careful examination of competing models including the Conway-Maxwell-Poisson count model should be undertaken prior to any evaluation and interpretation of capacity reliability. Moreover, this work shows that count data psychometric models are well suited for decisions with a focus on top researchers, because conditional reliability estimates (i.e., reliability depending on the level of capacity) were highest for the best researchers.
... Humans are social animals (as famously noted by Aristotle) and our opinions about something are heavily influenced by the opinions of our peers. Thus, the current setting is based on the principle that a weight factor to the probability of an article to be cited is higher if the current number of citations of this article is greater [24]. The effect of the number of times an article has already been cited to the probability of the article to be cited in our model is depicted by a function () y f x = , where y is a multiplying factor to the ultimate probability, which follows several qualitative principles: ...
... Here we assume that the probability of an article to be cited is higher if the current number of citations of this article is greater [22]. The effect of the number of times an article has already been cited to the probability of the article to be cited in our model is depicted by a function () y f x  , where y is a multiplying factor to the ultimate probability, which follows several qualitative principles: ...
Article
A simple abstract model is developed as a parallel experimental basis for the aim of exploring the differences of journal impact factors, particularly between different disciplines. Our model endeavors to simulate the publication and citation behaviors of the articles in the journals belonging to a similar discipline, in a distributed manner. Based on simulation experiments, the mechanism of influence from several fundamental factors to the trend of impact factor is revealed. These factors include the average review cycle, average number of references and yearly distribution of references. Moreover, satisfactory approximation could possibly be observed between certain actual data and simulation results.
Article
Purpose “Scholarly Communication” is a frequent topic of both the professional and research literature of Library and Information Science (LIS). Despite efforts by individuals (e.g. Borgman, 1989) and organizations such as the Association of College and Research Libraries (ACRL) to define the term, multiple understandings of it remain. Discussions of scholarly communication infrequently offer a definition or explanation of its parameters, making it difficult for readers to form a comprehensive understanding of scholarly communication and associated phenomena. Design/methodology/approach This project uses the evolutionary concept analysis (ECA) method developed by nursing scholar, Beth L. Rodgers, to explore “Scholarly Communication” as employed in the literature of LIS. As the purpose of ECA is not to arrive at “the” definition of a term but rather exploring its utilization within a specific context, it is an ideal approach to expand our understanding of SC as used in LIS research. Findings “Scholarly Communication” as employed in the LIS literature does not refer to a single phenomenon or idea, but rather is a concept with several dimensions and sub-dimensions with distinct, but overlapping, significance. Research limitations/implications The concept analysis (CA) method calls for review of a named concept, i.e. verbatim. Therefore, the items included in the data set must include the phrase “scholarly communication”. Items using alternate terminology were excluded from analysis. Practical implications The model of scholarly communication presented in this paper provides language to operationalize the concept. Originality/value LIS lacks a nuanced understanding of “scholarly communication” as used in the LIS literature. This paper offers a model to further the field's collective understanding of the term and support operationalization for future research projects.
Article
Cross-national distance among countries has been of central interest in International Business and Management research. Therefore, different efforts have been made to develop models/measurements to address this issue. In this article we identify the models/measurements of cross-national distance developed since the beginning of the 2000 decade. After briefly presenting each model’s distinctive features, we assess their impact on the research field based on a wide range of bibliometric techniques (direct, indirect, and adjusted citation impacts, altmetrics, academic reviews, journals and publishers’ prestige). Our analysis shows that the narrower cultural distance construct has lost ground to the wider psychic distance one. Furthermore, researchers highly value those models and measurement that go beyond the cultural and psychic distance constructs providing a multidimensional framework to analyze and measure cross-national distance among countries. Our analysis of these models’ impact shows that this a salient issue in the research field as a whole and a central topic in the highest ranked journals in International Business and Management.
Article
Purpose The purpose of this paper is to understand how Chinese library and information science (LIS) journal articles cite works from outside the discipline (WOD) to identify the impact of knowledge import from outside the discipline on LIS development. Design/methodology/approach This paper explores the Chinese LIS’ preferences in citing WOD by employing bibliometrics and machine learning techniques. Findings Chinese LIS citations to WOD account for 29.69 percent of all citations, and they rise over time. Computer science, education and communication are the most frequently cited disciplines. Under the categorization of Biglan model, Chinese LIS prefers to cite WOD from soft science, applied science or nonlife science. In terms of community affiliation, the cited authors are mostly from the academic community, but rarely from the practice community. Mass media has always been a citation source that is hard to ignore. There is a strong interest of Chinese LIS in citing emerging topics. Practical implications This paper can be implemented in the reformulation of Chinese LIS knowledge system, the promotion of interdisciplinary collaboration, the development of LIS library collection and faculty advancement. It may also be used as a reference to develop strategies for the global LIS. Originality/value This paper fills the research gap in analyzing citations to WOD from Chinese LIS articles and their impacts on LIS, and recommends that Chinese LIS should emphasize on knowledge both on technology and people as well as knowledge from the practice community, cooperate with partners from other fields, thus to produce knowledge meeting the demands from library and information practice as well as users.
Article
Full-text available
In actuarial hteramre, researchers suggested various statistical procedures to estimate the parameters in claim count or frequency model. In particular, the Poisson regression model, which is also known as the Generahzed Linear Model (GLM) with Poisson error structure, has been x~adely used in the recent years. However, it is also recognized that the count or frequency data m insurance practice often display overdispersion, i.e., a situation where the variance of the response variable exceeds the mean. Inappropriate imposition of the Poisson may underestimate the standard errors and overstate the sigruficance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the Negative Binomial and Generalized Poisson regression models as ahemafives for handling overdispersion. If the Negative Binomial and Generahzed Poisson regression models are fitted by the maximum likelihood method, the models are considered to be convenient and practical; they handle overdispersion, they allow the likelihood ratio and other standard maximum likelihood tests to be implemented, they have good properties, and they permit the fitting procedure to be carried out by using the herative Weighted I_,east Squares OWLS) regression similar to those of the Poisson. In this paper, two types of regression model will be discussed and applied; multiplicative and additive. The multiplicative and additive regression models for Poisson, Negative Binomial and Generalized Poisson will be fitted, tested and compared on three different sets of claim frequency data; Malaysian private motor third part T property' damage data, ship damage incident data from McCuUagh and Nelder, and data from Bailey and Simon on Canadian private automobile liabili~,.
Article
Full-text available
Traditionally, the most commonly used source of bibliometric data is Thomson ISI Web of Knowledge, in particular the Web of Science and the Journal Citation Reports (JCR), which provide the yearly Journal Impact Factors (JIF). This paper presents an alternative source of data (Googl Scholar, GS) as well as 3 alternatives to the JIF to assess journal impact (h-index, g-index and the number of citations per paper), Becatise of its broader range of data sources, the use of GS generally results in more comprehensive citation coverage in the area of management and international business. The use of GS particularly benefits academics publishing in sources that are not (well) covered in ISI. Among these are books, conference papers, non-US journals, and in general journals in the field of strategy and international business. The 3 alternative GS-based metrics showed strong correlations with the traditional JIR. As such, they provide academics and universities committed to JIFs with a good alternative for journals that are not ISI-indexed. However, we argue that these metrics provide additional advantages over the JIF and that the free availability of GS allows for a democratization of citation analysis as it provides every academic access to citation data regardless of their institution's financial means.
Article
Full-text available
Purpose More knowledge about open access (OA) scholarly publishing on the web would be helpful for citation data mining and the development of web‐based citation indexes. Hence, the main purpose of this study is to identify common characteristics of open access publishing, which may therefore enable us to measure different aspects of e‐research on the web. Design/methodology/approach In the current study, five characteristics of 545 OA citing sources targeting OA research articles in four science and four social science disciplines were manually identified, including file format, hyperlinking, internet domain, language and publication year. Findings About 60 per cent of the OA citing sources targeting research papers were in PDF format, 30 per cent were from academic domains ending in edu and ac and 70 per cent of the citations were not hyperlinked. Moreover, 16 per cent of the OA citing sources targeting studied papers in the eight selected disciplines were in non‐English languages. Additional analyses revealed significant disciplinary differences in some studied characteristics across science and the social sciences. Originality/value The OA web citation network was dominated by PDF format files and non‐hyperlinked citations. This knowledge of characteristics shaping the OA citation network gives a better understanding about their potential uses for open access scholarly research.
Article
Full-text available
Purpose – The purpose of this paper is to present a narrative review of studies on the citing behavior of scientists, covering mainly research published in the last 15 years. Based on the results of these studies, the paper seeks to answer the question of the extent to which scientists are motivated to cite a publication not only to acknowledge intellectual and cognitive influences of scientific peers, but also for other, possibly non‐scientific, reasons. Design/methodology/approach – The review covers research published from the early 1960s up to mid‐2005 (approximately 30 studies on citing behavior‐reporting results in about 40 publications). Findings – The general tendency of the results of the empirical studies makes it clear that citing behavior is not motivated solely by the wish to acknowledge intellectual and cognitive influences of colleague scientists, since the individual studies reveal also other, in part non‐scientific, factors that play a part in the decision to cite. However, the results of the studies must also be deemed scarcely reliable: the studies vary widely in design, and their results can hardly be replicated. Many of the studies have methodological weaknesses. Furthermore, there is evidence that the different motivations of citers are “not so different or ‘randomly given’ to such an extent that the phenomenon of citation would lose its role as a reliable measure of impact”. Originality/value – Given the increasing importance of evaluative bibliometrics in the world of scholarship, the question “What do citation counts measure?” is a particularly relevant and topical issue.
Article
Full-text available
Citations support the communication of specialist knowledge by allowing authors and readers to make specific selections in several contexts at the same time. In the interactions between the social network of (first-order) authors and the network of their reflexive (that is, second-order) communications, a sub-textual code of communication with a distributed character has emerged. The recursive operation of this dual-layered network induces the perception of a cognitive dimension in scientific communication.Citation analysis reflects on citation practices. Reference lists are aggregated in scientometric analysis using one (or sometimes two) of the available contexts to reduce the complexity: geometrical representations (‘mappings’) of dynamic operations are reflected in corresponding theories of citation. For example, a sociological interpretation of citations can be distinguished from an information-theoretical one. The specific contexts represented in the modern citation can be deconstructed from the perspective of the cultural evolution of scientific communication.
Article
Full-text available
: Assessing the quality of the knowledge produced by business and management academics is increasingly being metricated. Moreover, emphasis is being placed on the impact of the research rather than simply where it is published. The main metric for impact is the number of citations a paper receives. Traditionally this data has come from the ISI Web of Science but research has shown that this has poor coverage in the social sciences. A newer and different source for citations is Google Scholar. In this paper we compare the two on a dataset of over 4,600 publications from three UK Business Schools. The results show that Web of Science is indeed poor in the area of management and that Google Scholar, whilst somewhat unreliable, has a much better coverage. The conclusion is that Web of Science should not be used for measuring research impact in management.
Article
Full-text available
For practical reasons, bibliographic databases can only contain a subset of the scientific literature. The ISI citation databases are designed to cover the highest impact scientific research journals as well as a few other sources chosen by the Institute for Scientific Information (ISI). Google Scholar also contains citation information, but includes a less quality controlled collection of publications from different types of web documents. We define Google Scholar unique citations as those retrieved by Google Scholar which are not in the ISI database. We took a sample of 882 articles from 39 open access ISI-indexed journals in 2001 from biology, chemistry, physics and computing and classified the type, language, publication year and accessibility of the Google Scholar unique citing sources. The majority of Google Scholar unique citations (70%) were from full-text sources and there were large disciplinary differences between types of citing documents, suggesting that a wide range of non-ISI citing sources, especially from non-journal documents, are accessible by Google Scholar. This might be considered to be an advantage of Google Scholar, since it could be useful for citation tracking in a wider range of open access scholarly documents and to give a broader type of citation impact. An important corollary from our study is that Google Scholar’s wider coverage of Open Access (OA) web documents is likely to give a boost to the impact of OA research and the OA movement.
Article
Full-text available
This article describes the results of a network analysis based on the citation among Communication journals and those academic disciplines that are cited by those journals labeled as “Communication” by the Web of Science. The results indicate that the journals indexed solely as Communication rather than those also tagged as another social science are more central in the citation network. Further, a cluster analysis of the cited disciplines revealed three groupings, a micro psychological cluster, a macro socio-political group and a woman’s studies clique. A two-mode network analysis found that the most central Communication journals cited multiple clusters, while the peripheral journals cited only one, suggesting that the structure of influence on the field of Communication is more complex than suggested by Park and Leydesdorff (Scientometrics 81(1):157–175, 2009). Also, the results indicate that the macro cluster is about twice as influential as the micro cluster, rather than as Park and Leydesdorff suggest that Psychology is the discipline’s primary influence.
Article
Full-text available
Purpose The main purpose of this study is to assess the citation advantage for self‐archived Open Access (OA) agriculture research against its non‐OA counterparts. Design/methodology/approach At the article level, the paper compared the citation counts of self‐archived research with non‐OA articles based upon a sample of 400 research articles from ISI‐indexed (ISI, Institute for Scientific Information) agriculture journals in 2005. At the journal level the paper compared impact factors (IFs) of OA against non‐OA agriculture journals from 2005 to 2007 as reported by the ISI Journal Citation Reports. The paper also sought evidence of citation impact based on a random sample of 100 OA and 100 non‐OA publications from the Food and Agriculture Organization (FAO) of the United Nations in 2005. It used both ISI and Scopus databases for citation counting and also Google and Google Scholar for locating the self‐archived articles published in the non‐OA journals. Findings The results showed that there is an obvious citation advantage for self‐archived agriculture articles as compared to non‐OA articles. Out of a random sample of 400 articles published in non‐OA agriculture journals, about 14 per cent were OA and had a median citation count of four whereas the median for non‐OA articles was two. However, at the journal level the average IF for OA agriculture journals from 2005 to 2007 was 0.29, considerably lower than the average IF for non‐OA journals (0.65). Finally it found that FAO publications which were freely accessible online tended to attract more citations than non‐OA publications in the same year and had a mean citation count of 1.73 whereas the mean for non‐OA publications was 0.28. Originality/value Self‐archived agriculture research articles tended to attract higher citations than their non‐OA counterparts. This knowledge of the citation impact of OA agricultural research gives a better understanding about the potential effect of self‐archiving on the citation impact.
Article
Full-text available
In this paper we introduce a new data gathering method “Web/URL Citation” and use it and Google Scholar as a basis to compare traditional and Web-based citation patterns across multiple disciplines. For this, we built a sample of 1,650 articles from 108 Open Access (OA) journals published in 2001 in four science and four social science disciplines. We recorded the number of citations to the sample articles using several methods based upon the ISI Web of Science, Google Scholar and the Google search engine (Web/URL citations). For each discipline, we found significant correlations between ISI citations and both Google Scholar and Google Web/URL citations; with similar results when using total or average citations, and when comparing within and across (most) journals. We also investigated disciplinary differences. Google Scholar citations were more numerous than ISI citations in our four social science disciplines as well as in computer science, suggesting that Google Scholar is a more comprehensive tool for citation tracking in the social sciences and perhaps also in fast-moving fields where conference papers are highly valued and published online. The results for Web/URL citations suggested that counting a maximum of one hit per site produces a better measure for assessing the impact of OA journals or articles, because replicated web citations are very common within individual sites. The results can be considered as additional evidence that there is some commonality between traditional and Web-extracted citations.
Article
Full-text available
We present several modifications of the Poisson and negative binomial models for count data to accommodate cases in which the number of zeros in the data exceed what would typically be predicted by either model. The excess zeros can masquerade as overdispersion. We present a new test procedure for distinguishing between zero inflation and overdispersion. We also develop a model for sample selection which is analogous to the Heckman style specification for continuous choice models. An application is presented to a data set on consumer loan behavior in which both of these phenomena are clearly present.
Article
Purpose – The purpose of this paper is to provide a close, detailed analysis of the frequency, nature, and depth of visible use of two of Foucault’s classic early works, The Archaeology of Knowledge and The Order of Things , by library, and information science/studies (LIS) scholars. Design/methodology/approach – The study involved conducting extensive full-text searches in a large number of electronically available LIS journal databases to find citations of Foucault’s works, then examining each citing article and each individual citation to evaluate the nature and depth of each use. Findings – Contrary to initial expectations, the works in question are relatively little used by LIS scholars in journal articles, and where they are used, such use is often only vague, brief, or in passing. In short, works traditionally seen as central and foundational to discourse analysis appear relatively little in discussions of discourse. Research limitations/implications – The study was limited to a certain batch of LIS journal articles that are electronically available in full text at UCLA, where the study was conducted. The results potentially could change by focussing on a fuller or different collection of journals or on non-journal literature. More sophisticated bibliometric techniques could reveal different relative performance among journals. Other research approaches, such as discourse analysis, social network analysis, or scholar interviews, might reveal patterns of use and influence that are not visible in the journal literature. Originality/value – This study’s intensive, in-depth study of quality as well as quantity of citations challenges some existing assumptions regarding citation analysis and the sociology of citation practices, plus illuminating Foucault scholarship.
Article
Students in both social and natural sciences often seek regression methods to explain the frequency of events, such as visits to a doctor, auto accidents, or new patents awarded. This book, now in its second edition, provides the most comprehensive and up-to-date account of models and methods to interpret such data. The authors combine theory and practice to make sophisticated methods of analysis accessible to researchers and practitioners working with widely different types of data and software in areas such as applied statistics, econometrics, marketing, operations research, actuarial studies, demography, biostatistics and quantitative social sciences. The new material includes new theoretical topics, an updated and expanded treatment of cross-section models, coverage of bootstrap-based and simulation-based inference, expanded treatment of time series, multivariate and panel data, expanded treatment of endogenous regressors, coverage of quantile count regression, and a new chapter on Bayesian methods.
Article
Journal subject classification is important in terms of being used for scholarly information service and foundation of disciplinary research analysis. Subject classification by subject matter experts or journal information takes a significant amount of time or does not provide accurate information about subject respectively. In order to overcome these current problems, this research suggested automatic subject classification method by using SCI journal information cited by domestic science and engineering journals, and it also investigated the classification results. We found that using the entire cited academic journals has a better accuracy rate than using the most cited three or five academic journals, and this research showed that the more academic journals included in the analysis the more accurate the rate. Especially, this research utilised the subject category of Web of Science as the standard of subject classification and provided foundations for comparing subject category structure in academic research results in KSCI and SCI.
Article
Citation analysis is an important tool used to trace scholarly research, measure impact, and justify tenure and funding decisions. Web of Science, which indexes peer-reviewed journal literature, has been the major research database for citation tracking. Changes in scholarly communication, including preprint/postprint servers, technical reports available via the internet, and open access e-journals are developing rapidly, and traditional citation tracking using Web of Science may miss much of this new activity. Two new tools are now available to count citations: Scopus and Google Scholar. This paper presents a case study comparing the citation counts provided by Web of Science, Scopus, and Google Scholar for articles from the Journal of the American Society for Information Science and Technology (JASIST) published in 1985 and in 2000 using a paired t-test to determine statistical significance. Web of Science provided the largest citation counts for the 1985 articles, although this could not be tested statistically. For JASIST articles published in 2000, Google Scholar provided statistically significant higher citation counts than either Web of Science or Scopus, while there was no significant difference between Web of Science and Scopus. The implications for measuring impact in a changing scholarly communication environment are examined.
Article
ABSTRACT ,,This paper demonstrates that the conditional negative binomial model for panel data, proposed by Hausman, Hall and Griliches (1984), is not a true fixed-effects method. This method—which has been implemented in both Stata and LIMDEP—does not, in fact, control for all stable covariates. Three alternative methods,are explored. A negative multinomial model yields the same estimator as the conditional Poisson estimator and, hence, does not provide any additional leverage for dealing with overdispersion. On the other hand, a simulation study yields good results from applying an unconditionalnegative binomial regression estimator with dummy variables to represent the fixed effects. There is no evidence for any incidental parameters bias in the coefficients, and downward bias in the standard error estimates can be easily and effectively corrected using the deviance statistic. Finally, an approximate conditional method is found to perform at about the same level as the unconditional estimator. 3
Article
Impact factor is a quasi-qualitative indicator, which provides a measurement of the prestige and international visibility of journals. Although the use of impact factor-based indicators for science policy purposes has increased over the last two decades, several limitations have been pointed out and should be borne in mind. The use of impact factor should be treated carefully when applied to the analysis of peripheral countries, whose national journals are hardly covered by ISI databases. Our experience in the use of impact factor based indicators for the analysis of the Spanish scientific production is shown. The usefulness of the impact factor measures in macro, meso and micro analyses is displayed. In addition, the main advantages, such as the great accessibility of impact factor and its ready-to-use nature are pointed out. Several limitations such as the need to avoid inter-field comparisons or the convenience of using a fixed journal set for international comparisons are also stressed. It is worth noting that the use of impact factor in the research evaluation process has influenced strongly the publication strategy of scientists.
Article
This article discusses the potential of Google Scholar as an alternative or complement to the Web of Science and Scopus for measuring the impact of journal articles in education. Three handbooks on research in science education, language education, and educational technology were used to identify a sample of 112 accomplished scholars. Google Scholar, Web of Science, and Scopus citations for 401 journal articles published by these authors during the 5-year period from 2003 to 2007 were then analyzed. The findings illustrate the promise and pitfalls of using Google Scholar for characterizing the influence of research output, particularly in terms of differences between the three subfields in publication practices. A calibration of the growth of Google Scholar citations is also provided.
Article
This paper was written to mark the 50th anniversary of Neyman and Scott's Econometrica paper defining the incidental parameter problem. It surveys the history both of the paper and of the problem in the statistics and econometrics literature.
Article
Fixed effects estimators of nonlinear panel models can be severely biased due to the incidental parameters problem. In this paper, I characterize the leading term of a large-T expansion of the bias of the MLE and estimators of average marginal effects in parametric fixed effects panel binary choice models. For probit index coefficients, the former term is proportional to the true value of the coefficients being estimated. This result allows me to derive a lower bound for the bias of the MLE. I then show that the resulting fixed effects estimates of ratios of coefficients and average marginal effects exhibit no bias in the absence of heterogeneity and negligible bias for a wide variety of distributions of regressors and individual effects in the presence of heterogeneity. I subsequently propose new bias-corrected estimators of index coefficients and marginal effects with improved finite sample properties for linear and nonlinear models with predetermined regressors.
Article
This study1 assesses the ways in which citation searching of scholarly print journals is and is not analogous to backlink searching of scholarly e-journal articles on the WWW, and identifies problems and issues related to conducting and interpreting such searches. Backlink searches are defined here as searches for Web pages that link to a given URL. Backlink searches were conducted on a sample of 39 scholarly electronic journals. Search results were processed to determine the number of backlinking pages, total backlinks, and external backlinks made to the e-journals and to their articles. The results were compared to findings from a citation study performed on the same e-journals in 1996. A content analysis of a sample of the files backlinked to e-journal articles was also undertaken. The authors identify a number of reliability issues associated with the use of “raw” search engine data to evaluate the impact of electronic journals and articles. No correlation was found between backlink measures and ISI citation measures of e-journal impact, suggesting that the two measures may be assessing something quite different. Major differences were found between the types of entities that cite, and those that backlink, e-journal articles, with scholarly works comprising a very small percentage of backlinking files. These findings call into question the legitimacy of using backlink searches to evaluate the scholarly impact of e-journals and e-journal articles (and by extension, e-journal authors).
Article
Web citations have been proposed as comparable to, even replacements for, bibliographic citations, notably in assessing the academic impact of work in promotion and tenure decisions. We compared bibliographic and Web citations to articles in 46 journals in library and information science. For most journals (57%), Web citations correlated significantly with both bibliographic citations listed in the Social Sciences Citation Index and the ISI's Journal Impact Factor. Many of the Web citations represented intellectual impact, coming from other papers posted on the Web (30%) or from class readings lists (12%). Web citation counts were typically higher than bibliographic citation counts for the same article. Journals with more Web citations tended to have Web sites that provided tables of contents on the Web, while less cited journals did not have such publicity. The number of Web citations to journal articles increased from 1992 to 1997.
Article
We review the problems of citation analysis. Most of them have either not been studied or have received only cursory attention. Since major error results when these problems are not taken into account, users of citation-based literature should proceed cautiously. © 1989 John Wiley & Sons, Inc.
Article
Doctoral Thesis submitted in partial fulfilment of the requirements for the award of PhD of Loughborough University. Four subjects, ecology, applied mathematics, sociology and economics, were selected to assess whether there is a citation advantage between journal articles that have an open access (OA) version on the Internet compared to those articles that are exclusively toll access (TA). In two rounds of data collection, citations were counted using the Web of Science and the OA status of articles was determined by using the search tools OAIster, OpenDOAR, Google and Google Scholar. In the first round a purposive sample of 4633 articles for the four subjects from high impact journals were examined, 2280 (49%) were OA and had a mean citation count of 9.04, whereas the mean for TA articles was 5.76. There was a clear citation advantage for those articles that were OA as opposed to those that were TA. This advantage, however, varied between disciplines, with sociology having the highest citation advantage but the lowest number of OA articles from the sample taken and ecology having the highest individual citation count for OA articles but the smallest citation advantage. Tests of correlation between OA status and a number of variables were generally found to be weak or inconsistent but some associations were significant. Google and Google Scholar were more successful at finding OA articles on the Internet than were OAIster or OpenDOAR. The country of origin of the citing authors for applied maths was found in order to assess whether those authors from poorer countries cited OA articles more frequently than TA articles. While cited to citing article ratios from lower income countries favoured OA articles, overall percentages gave mixed results. The data from the second round confirmed the result for sociology. The second sample for ecology was randomly taken from 82 journals and exhibited a greater OA advantage. For economics, a second purposive sample of articles from 21 mid-range impact journals was taken and also exhibited a greater OA advantage. In an attempt to establish the cause of any citation advantage, logistic regression was used to try to determine whether the bibliographic characteristics of the articles from both rounds could be used to predict OA status. Results from this were generally inconclusive..
Article
This article published was published in the journal, Journal of the American Society for Information Science [© 2008 ASIS&T] and the definitive version is available at: http://www3.interscience.wiley.com/journal/117946195/grouphome/home.html Four subjects, ecology, applied mathematics, sociology and economics, were selected to assess whether there is a citation advantage between journal articles that have an open access (OA) version on the Internet compared to those articles that are exclusively toll access (TA). Citations were counted using the Web of Science and the OA status of articles was determined by searching OAIster, OpenDOAR, Google and Google Scholar. Of a sample of 4633 articles examined, 2280 (49%) were OA and had a mean citation count of 9.04, whereas the mean for TA articles was 5.76. There appears to be a clear citation advantage for those articles that are OA as opposed to those that are TA. This advantage, however, varies between disciplines, with sociology having the highest citation advantage but the lowest number of OA articles from the sample taken and ecology having the highest individual citation count for OA articles but the smallest citation advantage. Tests of correlation or association between OA status and a number of variables were generally found to be weak or inconsistent. The cause of this citation advantage has not been determined. Accepted for publication
Article
The canonical parameterization of the negative binomial derives directly from the exponential form of the negative binomial probability distribution function. Unlike the NB-2 and NB-1 parameterizations, it is not derived as a Poisson-gamma mixture model, and has the heterogeneity or ancillary parameter as a term in the mean and variance functions. However, the canonical negative binomial can be used effectively to model count response data. The Heterogeneous Canonical Negative Binomial command is similar to Stata's gnbreg command, allowing the ancillary parameter to itself be parameterized. The value of this option is that one may better understand which predictors influence model heterogeneity. That is, it assists in identifying the source of correlation in the data. The command also displays both the AIC and Deviance statistics to aid in model comparison and provides use of Stata's maximum likelihood and survey options.
Article
We show that under the alternative hypothesis the Hausman chi-square test statistic can be negative not only in small samples but even asymptotically. Therefore in large samples such a result is only compatible with the alternative and should be interpreted accordingly. Applying a known insight from finite samples, this can only occur if the different estimation precisions (often the residual variance estimates) under the null and the alternative both enter the test statistic. In finite samples, using the absolute value of the test statistic is a remedy that does not alter the null distribution and is thus admissible. Even for positive test statistics the relevant covariance matric difference should be routinely checked for positive semi-definite-ness, because we also show that otherwise test results may be misleading. Of course the preferable solution still is to impose the same nuisance parameter (i.e., residual variance) estimate under the null and alternative hypotheses, if the model context permits that with relative ease. We complement the likelihood-based exposition by a formal proof in an omitted-variable context, we present simulation evidence for the test of panel random effects, and we illustrate the problems with a panel homogeneity test.
Article
The ISI ® Journal Citation Reports (JCR ® Journal Citation Reports (ICR ® ) impact factor has moved in recent years form an obscure biblimetric indicator to become the chief quantitative measure of the quality of journal, its research papers, the researchers who wrote those papers, and even the institution they work in. This pamphlet looks at the limitation of the impact factor, how it can and how it should not be used.
Article
I first mentioned the idea of an impact factor in Science in 1955.¹ With support from the National Institutes of Health, the experimental Genetics Citation Index was published, and that led to the 1961 publication of the Science Citation Index.² Irving H. Sher and I created the journal impact factor to help select additional source journals. To do this we simply re-sorted the author citation index into the journal citation index. From this simple exercise, we learned that initially a core group of large and highly cited journals needed to be covered in the new Science Citation Index (SCI). Consider that, in 2004, the Journal of Biological Chemistry published 6500 articles, whereas articles from the Proceedings of the National Academy of Sciences were cited more than 300 000 times that year. Smaller journals might not be selected if we rely solely on publication count,³ so we created the journal impact factor (JIF).
Article
This paper explores the patterns of citations among patents taken out by inventors in the U.S., the U.K., France, Germany and Japan. We find that, (1) Patets assigned to the same firm are more likely to cit each other, and come sooner that other citations; (2) patents in the same patent class are approximately 100 times as likely to cite each other as patents from different patent classes, but there is not a strong time pattern to this effect; (3) patents whose investors reside in the same country are typically 30 to 80% more likely to cite each other than inventors from other countries, and these, and these citations come sooner, and (4) there are clear country-specific citation tendencies, e.g., Japanese citations typically come sooner than those of other countries.
Article
This paper investigates the impact of international migration on technical efficiency, resource allocation and income from agricultural production of family farming in Albania. The results suggest that migration is used by rural households as a pathway out of agriculture: migration is negatively associated with both labour and non-labour input allocation in agriculture, while no significant differences can be detected in terms of farm technical efficiency or agricultural income. Whether the rapid demographic changes in rural areas triggered by massive migration, possibly combined with propitious land and rural development policies, will ultimately produce the conditions for a more viable, high-return agriculture attracting larger investments remains to be seen.
Article
Using the result that under the null hypothesis of no misspecification an asymptotically efficient estimator must have zero asymptotic covariance with its difference from a consistent but asymptotically inefficient estimator, specification tests are devised for a number of model specifications in econometrics. Local power is calculated for small departures from the null hypothesis. An instrumental variable test as well as tests for a time series cross section model and the simultaneous equation model are presented. An empirical model provides evidence that unobserved individual factors are present which are not orthogonal to the included right-hand-side variable in a common econometric specification of an individual wage equation.
Article
In data with a group structure, incidental parameters are included to control for missing variables. Applications include longitudinal data and sibling data. In general, the joint maximum likelihood estimator of the structural parameters is not consistent as the number of groups increases, with a fixed number of observations per group. Instead a conditional likelihood function is maximized, conditional on sufficient statistics for the incidental parameters. In the logit case, a standard conditional logit program can be used. Another solution is a random effects model, in which the distribution of the incidental parameters may depend upon the exogenous variables.
Conference Paper
Currently, more and more research papers are being published in the form of digital libraries on the World Wide Web. How to search them efficiently and effectively is a big challenge for researchers. With a static subject tree to index the paper and with the traditional query mechanism, many problems appear. Detail-level topics can hardly be found, emerging new areas cannot be identified, non-semantically related papers can hardly be retrieved, and the key papers in the area cannot be obviously pointed out. In order to solve these problems, this paper proposes a novel approach to map the citation retrieval problem into a graph partitioning problem. All citations in a digital library are mapped to a citation graph through their reference links. It is observed that the citation graph is not evenly connected. Highly connected sub-graphs often emerge. Different sub-graphs represent different topics, and partitioning the graph at higher levels can reveal the detail-level topics. The different connectivities can also help in finding hot topics, related topics and key citations. Since all these can be done automatically and efficiently, the user's manual effort in searching for citations is saved, but the results are more comprehensive and accurate
Article
An analysis of 2,765 articles published in four math journals from 1997 to 2005 indicate that articles deposited in the arXiv received 35% more citations on average than non-deposited articles (an advantage of about 1.1 citations per article), and that this difference was most pronounced for highly-cited articles. Open Access, Early View, and Quality Differential were examined as three non-exclusive postulates for explaining the citation advantage. There was little support for a universal Open Access explanation, and no empirical support for Early View. There was some inferential support for a Quality Differential brought about by more highly-citable articles being deposited in the arXiv. In spite of their citation advantage, arXiv-deposited articles received 23% fewer downloads from the publisher's website (about 10 fewer downloads per article) in all but the most recent two years after publication. The data suggest that arXiv and the publisher's website may be fulfilling distinct functional needs of the reader.