ArticlePDF Available

Abstract and Figures

The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160-165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.
Content may be subject to copyright.
A preview of the PDF is not available
... The total number of documents in the GS database, without any language restriction, is between 160 and 165 million (Orduña-Malea et al., 2015;Waltman, 2016). With 389 million records, GS is currently the most comprehensive academic search engine (Gusenbauer, 2019). ...
Article
Full-text available
Whilst there is some literature on the effect of inward foreign direct investment on domestic investment for the whole economy and the agricultural sector, that of foreign divestment on domestic investment for food manufacturing is rare. This paper contributes to the literature by estimating the crowding effect of foreign divestment on domestic investment in the food manufacturing sector using an unbalanced panel of 29 countries from 1991 to 2019. Foreign divestment crowded out domestic investment for developed countries in the short and long runs. In terms of the absolute reduction in domestic investment, the short-run effect is higher than the long-run effect. Policies to attract inward foreign direct investment and retain it should be pursued.
... An early study estimated that Google Scholar had indexed 87% (100 million of 114 million) of all English-language scholarly documents on the web (Khabsa & Giles, 2014;broadly agreeing with: Aguillo, 2012). Another study estimated that in May 2014 Google Scholar had three times more scholarly records than the Web of Science (171 million compared to 57 million) (Orduña-Malea et al., 2015). More recently Google Scholar was estimated to index 389 million scholarly related records, substantially more than Scopus (72.2 million) and the Web of Science (67.7 million) (Gusenbauer, 2019). ...
Preprint
Full-text available
This literature review identifies indicators that associate with higher impact or higher quality research from article text (e.g., titles, abstracts, lengths, cited references and readability) or metadata (e.g., the number of authors, international or domestic collaborations, journal impact factors and authors' h-index). This includes studies that used machine learning techniques to predict citation counts or quality scores for journal articles or conference papers. The literature review also includes evidence about the strength of association between bibliometric indicators and quality score rankings from previous UK Research Assessment Exercises (RAEs) and REFs in different subjects and years and similar evidence from other countries (e.g., Australia and Italy). In support of this, the document also surveys studies that used public datasets of citations, social media indictors or open review texts (e.g., Dimensions, OpenCitations, Altmetric.com and Publons) to help predict the scholarly impact of articles. The results of this part of the literature review were used to inform the experiments using machine learning to predict REF journal article quality scores, as reported in the AI experiments report for this project. The literature review also covers technology to automate editorial processes, to provide quality control for papers and reviewers' suggestions, to match reviewers with articles, and to automatically categorise journal articles into fields. Bias and transparency in technology assisted assessment are also discussed.
... This limitation, which might also affect a widening group of journals and publishers that are embracing open citations 10 , encompasses the corpus of papers and gray scientific literature that is published as open access (OA) and that does not have a DOI, but that is nonetheless cited in indexed literature, or that is published in refereed journals. One case is Google Scholar, which was already estimated to have, back in 2014, about 160 million documents (Orduña-Malea et al., 2015), but in 2019 was estimated to have over 300 million records (Delgado Lopez-Cozar et al., 2019). This is important because Google Scholar, whichunlike Scopus and WoSis free to access, was able to identify 88% of about 3.1 million citations, superseding the ability of Scopus or WoS (Martín-Martín et al., 2021). ...
Article
Full-text available
Citations in a scientific paper reference other studies and form the information backbone of that paper. If cited literature is valid and non-retracted, an analysis of citations can offer unique perspectives about the supportive or contradictory nature of a statement. Yet, such analyses are still limited by the relative lack of access to open citation data. The creation of open citation databases (OCDs) allows for data analysts, bibliometric specialists and other academics interested in such topics to independently verify the validity and accuracy of a citation. Since the strength of an individual's curriculum vitae can be based on, and assessed by, metrics (citation counts, altmetric mentions, journal ranks, etc.), there is interest in appreciating citation networks and their link to research performance. Open citations would thus not only benefit career, funding and employment initiatives, they could also be used to reveal citation rings, abusive author-author or journal-journal citation strategies, or to detect false or erroneous citations. OCDs should be open to the public, and publishers have a moral responsibility of releasing citation data for free use and academic exploration. Some challenges remain, including long-term funding, and data and information security.
... The objective of Google Scholar was to bring Google search simplicity to the academic environment, but it has crawled the whole web by indexing any record with seemingly academic structure (Martín-Martín et al. [56]). By using this inclusive approach, Google Scholar provides comprehensive coverage of scientific/academic documents without following the selective journal-based inclusion policies (Orduña-Malea et al. [57]; Van Noorden [58]; and Martín-Martín et al. [59]). Google Scholar covers over 300 million records (Delgado López-Cózar et al. [60] Under the WoS website, the wealth of data extracted for the bibliometric review leads indubitably to data processing and interpretation challenges (Solomon [62]). ...
Article
Full-text available
Fractional programming (FP) refers to a family of optimization problems whose objective function is a ratio of two functions. FP has been studied extensively in economics, management science, information theory, optic and graph theory, communication, and computer science, etc. This paper presents a bibliometric review of the FP-related publications over the past five decades in order to track research outputs and scholarly trends in the field. The reviews are conducted through the Science Citation Index Expanded (SCI-EXPANDED) database of the Web of Science Core Collection (Clarivate Analytics). Based on the bibliometric analysis of 1811 documents, various theme-related research indicators were described, such as the most prominent authors, the most commonly cited papers, journals, institutions, and countries. Three research directions emerged, including Electrical and Electronic Engineering, Telecommunications, and Applied Mathematics.
... In line with other SLRs, and to ensure access to an extensive range of journals to reduce the risk of excluding relevant studies, this study used two main databases of Google Scholar (GS) and Web of Science (WoS), along with three other databases which were ScienceDirect, SSRN, and EBSCO. According to Orduña-Malea et al. (2015), GS includes any document with a seemingly academic structure, which leads to potentially massive coverage of the scholarly literature. Also, the WoS, with more than 17,000 international journals in different research areas, is considered one of the most extensive databases . ...
Thesis
The current thesis seeks to enhance our understanding and the existing knowledge of the impact of corporate governance (CG) on sustainability reporting (SR) around the world. This is achieved by carrying out three distinctive, but intimately connected papers. These are: (i) an up-to-date systematic review of the current empirical research investigating the relationship between CG and SR; (ii) an examination of the influence of CG on total SR and separately on its three dimensions, and whether the influence differs between developed and developing countries; and (iii) whether the efficacy of the CG-SR nexus depends on sampling decision, and whether this relationship is significantly different between the financial and non-financial sectors. The first paper conducts a systematic literature review (SLR) of the relationship between CG and SR. The final sample includes 117 empirical studies conducted in over 50 countries from 2000 to 2019 and published in 72 scholarly journals. The paper finds that very few articles examine all three dimensions of SR (economic, environmental and social). The paper also shows that most previous studies are based on developing countries and exclude the financial sector from the investigation. Additionally, the majority of prior studies focus on the quantity of SR and apply single rather than multiple theoretical frameworks, with agency theory being the dominant theoretical lens. Moreover, the findings of the influence of board attributes frequently examined iv (size, independence, gender diversity, and CEO duality) are conflicting. Thus, this paper provides suggestions for future research on the CG-SR nexus. The second paper investigates the impact of CG, with a particular reference to board characteristics (i.e. board size, board independence, CEO duality, board gender diversity, and the existence of sustainability committee (SC)) on total SR practices and separately on each dimension (economic, environmental and social) based on stakeholder-agency theory. Using a sample of 370 firms from 50 countries (22 developed countries and 28 developing countries) in 2017 and a Global Reporting Initiative (GRI) standards-based disclosure index to measure the level of SR across various reporting mediums, the paper shows that the impact of several board characteristics differs by dimension. Then, the paper conducts further analysis by dividing the sample into developed and developing countries. The findings show that the relationship between some board attributes and total SR differs between developed and developing countries. Following similar analysis, and drawing on agency and resource dependence theories, the third paper conducts sector-based research of the CG-SR nexus. Specifically, this paper, first, explores whether the efficacy of several board mechanisms (i.e. board size, board independence, CEO duality, board gender diversity, board age, board tenure, and the presence of SC) on SR practices differs depending on sampling decision. Second, the paper examines the differences in the effect of these governance mechanisms on SR practices between financial and non-financial firms. Using data relating to 370 companies (104 from the financial sector and 266 from the non-financial sector) belonging to 50 countries in 2017 and a disclosure index based on GRI standards to quantify the SR activities, the paper finds that the chosen sample influences the relationship between some board characteristics and SR. Furthermore, the paper suggests that several board attributes affect SR practices in financial and non-financial sectors differently.
... This study takes a new methodological route of analyzing, estimating, and comparing subject coverage across databases-an approach necessary to allow the assessment of overall subject coverage for a large number of databases. It applies the method of query hit counts (QHC) used in scientometric analyses (e.g., Da Teixeira Silva et al., 2020;Gusenbauer, 2019;Kousha & Thelwall, 2020;Lazarus et al., 2020;Orduña-Malea et al., 2015) to determine subject coverage. The QHC method is particularly beneficial as it allows access to the entire database without requiring the download of individual records. ...
Article
Full-text available
This paper introduces a novel scientometrics method and applies it to estimate the subject coverages of many of the popular English-focused bibliographic databases in academia. The method uses query results as a common denominator to compare a wide variety of search engines, repositories, digital libraries, and other bibliographic databases. The method extends existing sampling-based approaches that analyze smaller sets of database coverages. The findings show the relative and absolute subject coverages of 56 databases—information that has often not been available before. Knowing the databases’ absolute subject coverage allows the selection of the most comprehensive databases for searches requiring high recall/sensitivity, particularly relevant in lookup or exploratory searches. Knowing the databases’ relative subject coverage allows the selection of specialized databases for searches requiring high precision/specificity, particularly relevant in systematic searches. The findings illustrate not only differences in the disciplinary coverage of Google Scholar, Scopus, or Web of Science, but also of less frequently analyzed databases. For example, researchers might be surprised how Meta (discontinued), Embase, or Europe PMC are found to cover more records than PubMed in Medicine and other health subjects. These findings should encourage researchers to re-evaluate their go-to databases, also against newly introduced options. Searching with more comprehensive databases can improve finding, particularly when selecting the most fitting databases needs particular thought, such as in systematic reviews and meta-analyses. This comparison can also help librarians and other information experts re-evaluate expensive database procurement strategies. Researchers without institutional access learn which open databases are likely most comprehensive in their disciplines.
Book
Full-text available
Niniejsza monografia jest pokłosiem XI edycji Ogólnopolskiej Konferencji Naukowej „Narzędzia analityczne w naukach ekonomicznych” i w związku z tym zawiera głównie wybrane zastosowania narzędzi statystycznych i ekonometrycznych w modelowaniu różnego typu zagadnień z obszaru ekonomii i finansów. W niniejszym opracowaniu podkreślono też rolę informatyki ekonomicznej oraz bibliometrii, jako potencjalnego źródła narzędzi analitycznych do rozwiązywania problemów z zakresu nauk ekonomicznych. Autorami rozdziałów są studenci, doktoranci, bądź młodzi pracownicy naukowi - zob. spis treści w tekście. Osoba wpisana jako autor pełni rolę redaktora naukowego monografii i jest autorem jedynie wstępu oraz tekstu na tylnej okładce.
Article
Full-text available
Vascular epiphytes stand out in tropical forests in terms of diversity. However, no comprehensive review of the group in the Amazon region has been performed so far. We carried out a literature review on the scientific knowledge of vascular epiphytes in the Amazon aiming to identify the main gaps, limitations and perspectives for studies on the subject. Searches were conducted in Google Scholar, Scopus and Web of Science using inclusion and exclusion criteria. 291 articles published in the period 1933-2022, mostly the 21st century, were included in the review. Brazil was the most studied country. However, knowledge gaps were found in regions located in the Brazilian arc of deforestation as well as in areas of Bolivia, Guyana, French Guiana and Suriname. There was a predominance of studies related to the floristics, systematics and biogeography of spermatophytes and ferns, focusing on the diversity and taxonomy of certain families (e.g. Orchidaceae). However, we found gaps for more comprehensive research, considering population dynamics, dominance (biomass), guidelines for evaluation of epiphyte and systematization of data for Amazon. We indicate the need of studies focused on ecology, floral and reproductive biology, biochemistry, phytochemistry, anatomy and physiology. Future research should also consider the impacts of current trends in deforestation and climatic changes on the diversity of vascular epiphytes in the Amazon.
Article
Kajian ini berupaya melihat perkembangan penelitian dengan topik Pancasila. Kajian ini bertujuan untuk mengetahui kuantitas publikasi internasional untuk topik Pancasila dan peta perkembangan publikasi internasional di bidang Pancasila berdasarkan kata kunci Pancasila. Pengumpulan data dilakukan dengan metode penelusuran indeks publikasi internasional Publish or Perish dengan opsi Google Scholar melalui kata kunci Pancasila dan kategorisasi article title, abstract, keywords dalam rentang tahun 2017-2022. Data yang terekam berupa jumlah publikasi internasional setiap tahun, dan jurnal yang memuat artikel Pancasila. Adapun tren pemetaannya digambarkan dengan aplikasi VosViewer . Hasil penelitian menunjukkan bahwa perkembangan penelitian Pancasila dalam rentang waktu 2017-2022 masih berada pada topik-topik penelitian yang cenderung general dan monoton seperti topik tentang hukum, pendidikan, karakter dan perilaku. Intensitas penelitian paling banyak dilakukan pada tahun 2020. Adapun yang berkaitan dengan tren topik penelitian yang masih jarang dilakukan dan menjadi rekomendasi untuk penelitian selanjutnya adalah topik khusus seperti radikalisme .
Article
Google Scholar has become an important search platform for students in higher education, and, as such, can be regarded as a competitor to university libraries. Previous research has explored students’ intention to use Google Scholar (GS) and University Digital Libraries (UDLs), but there is a lack of comparative studies that explore students’ preferences between these two platforms. Therefore, this study seeks to explore the search behaviour of a select group of users, international postgraduate students and more specifically compares the factors that influence their use of Google Scholar and University Digital Libraries (UDLs). A questionnaire-based survey, based on the factors in the UTAUT model (unified theory of acceptance and use of technology) was conducted to collect data on acceptance and use of technology of GS and UDL’s respectively. Data was collected from 400 international postgraduate students studying in the United Kingdom. Confirmatory factor analysis was used to establish the contextual influencing factors, whilst structural equation modelling examined the predicted model. The results suggest some differences between the influence of various factors between the UDL dataset and the GS datasets. They suggest that social influence (SI) did not affect behavioural intention (BI) for either data set, but that for the UDL dataset, effort expectancy did not affect BI, whereas for the GS dataset facilitating conditions did not influence BI. The approach taken in this study further facilitates research into the use of search tools to progress beyond ease of use as a main driver and to explore the relationship between internal and external influences of use. Recommendations for further research are suggested and the value of the insights gained for UDLs and their provision and support for all students is discussed.
Article
Full-text available
Purpose – The purpose of this paper is to describe the obsolescence process of Microsoft Academic Search (MAS) as well as the effects of this decline in the coverage of disciplines and journals, and their influence in the representativeness of organizations. Design/methodology/approach – The total number of records and those belonging to the most reputable journals (1,762) and organizations (346) according to the Field Rating indicator in each of the 15 fields and 204 sub-fields of MAS, have been collected and statistically analysed in March 2014, by means of an automated querying process via http, covering academic publications from 1700 to present. Findings – MAS has no longer been updated since 2013, although this phenomenon began to be glimpsed in 2011, when its coverage plummeted. Throughout 2014, indexing of new records is still ongoing, but at a minimum rate, without following any apparent pattern. Research limitations/implications – There are also retrospective records being indexed at present. In this sense, this research provides a picture of what MAS offered during March 2014 being queried directly via http. Practical implications – The unnoticed obsolescence of MAS affects to the quality of the service offered to its users (both those who engage in scientific information seeking and also those who use it for quantitative purposes). Social implications – The predominance of Google Scholar (GS) as monopoly in the academic search engines market as well as the prevalence of an open construction model (GS) vs a closed model (MAS). Originality/value – A complete longitudinal analysis of disciplines, journals and organizations on MAS has been performed for the first time identifying an unnoticed obsolescence. Any public explanation or disclaimer note has been announced from the responsible company, something incomprehensible given its implications for the reliability and validity of bibliometric data provided on disciplines, journals, authors and congress as well as their fair representation on the academic search engine.
Working Paper
Full-text available
The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below: 1. Which are the most cited documents in GS? 2. Which are the most cited document types in GS? 3. What languages are the most cited documents written in GS? 4. How many highly cited documents are freely accessible? 4.1 What file types are the most commonly used to store these highly cited documents? 4.2 Which are the main providers of these documents? 5. How many of the highly cited documents indexed by GS are also indexed by WoS? 6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS? 7. How many versions of these highly cited documents has GS detected? 8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received? 9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages? 10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?
Article
Full-text available
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
Article
Full-text available
Despite its increasing role in communication, the world wide web remains the least controlled medium: any individual or institution can create websites with unrestricted number of documents and links. While great efforts are made to map and characterize the Internet's infrastructure, little is known about the topology of the web. Here we take a first step to fill this gap: we use local connectivity measurements to construct a topological model of the world wide web, allowing us to explore and characterize its large scale properties. Comment: 5 pages, 1 figure, updated with most recent results on the size of the www
Article
American universities today serve as economic engines, performing the scientific research that will create new industries, drive economic growth, and keep the United States globally competitive. But only a few decades ago, these same universities self-consciously held themselves apart from the world of commerce. Creating the Market University is the first book to systematically examine why academic science made such a dramatic move toward the market. Drawing on extensive historical research, Elizabeth Popp Berman shows how the government--influenced by the argument that innovation drives the economy--brought about this transformation. Americans have a long tradition of making heroes out of their inventors. But before the 1960s and '70s neither policymakers nor economists paid much attention to the critical economic role played by innovation. However, during the late 1970s, a confluence of events--industry concern with the perceived deterioration of innovation in the United States, a growing body of economic research on innovation's importance, and the stagnation of the larger economy--led to a broad political interest in fostering invention. The policy decisions shaped by this change were diverse, influencing arenas from patents and taxes to pensions and science policy, and encouraged practices that would focus specifically on the economic value of academic science. By the early 1980s, universities were nurturing the rapid growth of areas such as biotech entrepreneurship, patenting, and university-industry research centers. Contributing to debates about the relationship between universities, government, and industry, Creating the Market University sheds light on how knowledge and politics intersect to structure the economy.
Book
Academic Search Engines: intends to run through the current panorama of the academic search engines through a quantitative approach that analyses the reliability and consistence of these services. The objective is to describe the main characteristics of these engines, to highlight their advantages and drawbacks, and to discuss the implications of these new products in the future of scientific communication and their impact on the research measurement and evaluation. In short, Academic Search Engines presents a summary view of the new challenges that the Web set to the scientific activity through the most novel and innovative searching services available on the Web.
Article
Giant academic social networks have taken off to a degree that no one expected even a few years ago. A Nature survey explores why.
Article
Bing and Google customize their results to target people with different geographic locations and languages but, despite the importance of search engines for web users and webometric research, the extent and nature of these differences are unknown. This study compares the results of seventeen random queries submitted automatically to Bing for thirteen different English geographic search markets at monthly intervals. Search market choice alters a small majority of the top 10 results but less than a third of the complete sets of results. Variation in the top 10 results over a month was about the same as variation between search markets but variation over time was greater for the complete results sets. Most worryingly for users, there were almost no ubiquitous authoritative results: only one URL was always returned in the top 10 for all search markets and points in time, and Wikipedia was almost completely absent from the most common top 10 results. Most importantly for webometrics, results from at least three different search markets should be combined to give more reliable and comprehensive results, even for queries that return fewer than the maximum number of URLs.