Abstract and Figures

The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160-165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.
Content may be subject to copyright.
A preview of the PDF is not available
... The objective of Google Scholar was to bring Google search simplicity to the academic environment, but it has crawled the whole web by indexing any record with seemingly academic structure (Martín-Martín et al. [56]). By using this inclusive approach, Google Scholar provides comprehensive coverage of scientific/academic documents without following the selective journal-based inclusion policies (Orduña-Malea et al. [57]; Van Noorden [58]; and Martín-Martín et al. [59]). Google Scholar covers over 300 million records (Delgado López-Cózar et al. [60] Under the WoS website, the wealth of data extracted for the bibliometric review leads indubitably to data processing and interpretation challenges (Solomon [62]). ...
Article
Full-text available
Fractional programming (FP) refers to a family of optimization problems whose objective function is a ratio of two functions. FP has been studied extensively in economics, management science, information theory, optic and graph theory, communication, and computer science, etc. This paper presents a bibliometric review of the FP-related publications over the past five decades in order to track research outputs and scholarly trends in the field. The reviews are conducted through the Science Citation Index Expanded (SCI-EXPANDED) database of the Web of Science Core Collection (Clarivate Analytics). Based on the bibliometric analysis of 1811 documents, various theme-related research indicators were described, such as the most prominent authors, the most commonly cited papers, journals, institutions, and countries. Three research directions emerged, including Electrical and Electronic Engineering, Telecommunications, and Applied Mathematics.
... In line with other SLRs, and to ensure access to an extensive range of journals to reduce the risk of excluding relevant studies, this study used two main databases of Google Scholar (GS) and Web of Science (WoS), along with three other databases which were ScienceDirect, SSRN, and EBSCO. According to Orduña-Malea et al. (2015), GS includes any document with a seemingly academic structure, which leads to potentially massive coverage of the scholarly literature. Also, the WoS, with more than 17,000 international journals in different research areas, is considered one of the most extensive databases . ...
Thesis
The current thesis seeks to enhance our understanding and the existing knowledge of the impact of corporate governance (CG) on sustainability reporting (SR) around the world. This is achieved by carrying out three distinctive, but intimately connected papers. These are: (i) an up-to-date systematic review of the current empirical research investigating the relationship between CG and SR; (ii) an examination of the influence of CG on total SR and separately on its three dimensions, and whether the influence differs between developed and developing countries; and (iii) whether the efficacy of the CG-SR nexus depends on sampling decision, and whether this relationship is significantly different between the financial and non-financial sectors. The first paper conducts a systematic literature review (SLR) of the relationship between CG and SR. The final sample includes 117 empirical studies conducted in over 50 countries from 2000 to 2019 and published in 72 scholarly journals. The paper finds that very few articles examine all three dimensions of SR (economic, environmental and social). The paper also shows that most previous studies are based on developing countries and exclude the financial sector from the investigation. Additionally, the majority of prior studies focus on the quantity of SR and apply single rather than multiple theoretical frameworks, with agency theory being the dominant theoretical lens. Moreover, the findings of the influence of board attributes frequently examined iv (size, independence, gender diversity, and CEO duality) are conflicting. Thus, this paper provides suggestions for future research on the CG-SR nexus. The second paper investigates the impact of CG, with a particular reference to board characteristics (i.e. board size, board independence, CEO duality, board gender diversity, and the existence of sustainability committee (SC)) on total SR practices and separately on each dimension (economic, environmental and social) based on stakeholder-agency theory. Using a sample of 370 firms from 50 countries (22 developed countries and 28 developing countries) in 2017 and a Global Reporting Initiative (GRI) standards-based disclosure index to measure the level of SR across various reporting mediums, the paper shows that the impact of several board characteristics differs by dimension. Then, the paper conducts further analysis by dividing the sample into developed and developing countries. The findings show that the relationship between some board attributes and total SR differs between developed and developing countries. Following similar analysis, and drawing on agency and resource dependence theories, the third paper conducts sector-based research of the CG-SR nexus. Specifically, this paper, first, explores whether the efficacy of several board mechanisms (i.e. board size, board independence, CEO duality, board gender diversity, board age, board tenure, and the presence of SC) on SR practices differs depending on sampling decision. Second, the paper examines the differences in the effect of these governance mechanisms on SR practices between financial and non-financial firms. Using data relating to 370 companies (104 from the financial sector and 266 from the non-financial sector) belonging to 50 countries in 2017 and a disclosure index based on GRI standards to quantify the SR activities, the paper finds that the chosen sample influences the relationship between some board characteristics and SR. Furthermore, the paper suggests that several board attributes affect SR practices in financial and non-financial sectors differently.
... This study takes a new methodological route of analyzing, estimating, and comparing subject coverage across databases-an approach necessary to allow the assessment of overall subject coverage for a large number of databases. It applies the method of query hit counts (QHC) used in scientometric analyses (e.g., Da Teixeira Silva et al., 2020;Gusenbauer, 2019;Kousha & Thelwall, 2020;Lazarus et al., 2020;Orduña-Malea et al., 2015) to determine subject coverage. The QHC method is particularly beneficial as it allows access to the entire database without requiring the download of individual records. ...
Article
Full-text available
This paper introduces a novel scientometrics method and applies it to estimate the subject coverages of many of the popular English-focused bibliographic databases in academia. The method uses query results as a common denominator to compare a wide variety of search engines, repositories, digital libraries, and other bibliographic databases. The method extends existing sampling-based approaches that analyze smaller sets of database coverages. The findings show the relative and absolute subject coverages of 56 databases—information that has often not been available before. Knowing the databases’ absolute subject coverage allows the selection of the most comprehensive databases for searches requiring high recall/sensitivity, particularly relevant in lookup or exploratory searches. Knowing the databases’ relative subject coverage allows the selection of specialized databases for searches requiring high precision/specificity, particularly relevant in systematic searches. The findings illustrate not only differences in the disciplinary coverage of Google Scholar, Scopus, or Web of Science, but also of less frequently analyzed databases. For example, researchers might be surprised how Meta (discontinued), Embase, or Europe PMC are found to cover more records than PubMed in Medicine and other health subjects. These findings should encourage researchers to re-evaluate their go-to databases, also against newly introduced options. Searching with more comprehensive databases can improve finding, particularly when selecting the most fitting databases needs particular thought, such as in systematic reviews and meta-analyses. This comparison can also help librarians and other information experts re-evaluate expensive database procurement strategies. Researchers without institutional access learn which open databases are likely most comprehensive in their disciplines.
... Para conduzir a revisão sistemática da literatura, sobre as espécies vegetais utilizadas em projetos de IV para o manejo de águas pluviais no Brasil, foi utilizado o Google Acadêmico: além de apresentar uma grande sobreposição a bases tradicionais como o Scopus, Web of Science e Microsoft Academic Search (ORDUNA-MALEA et al., 2015), o Google Acadêmico também é recomendado por Meho e Yang (2007) pelo fato de ter a maior cobertura para idiomas não-anglófonos. ...
Conference Paper
Full-text available
Atualmente, uma das lacunas no desenvolvimento dos projetos paisagísticos de Infraestrutura Verde (IV) reside na escolha de espécies que sejam adequadas (i) às diferentes tipologias existentes e (ii) às particularidades regionais. Assim, o objetivo deste trabalho foi realizar uma revisão sistemática da literatura sobre a IV no Brasil, com foco nas tipologias para o manejo de águas pluviais e espécies recomendadas. Para tanto, foi realizada uma busca no Google Acadêmico, na qual foram identificados 196 artigos publicados em periódicos nacionais até o final do ano de 2019, que continham simultaneamente os termos ‘espécie’ e ‘infraestrutura verde’ (e variações na grafia). Porém, como a IV é um conceito abrangente e pode englobar diversas tipologias, os artigos passaram por uma leitura exploratória para verificar quais se relacionavam diretamente com o objetivo da pesquisa. Deste material, sete artigos se enquadravam no escopo, nos quais foram identificadas 82 espécies vegetais recomendadas para uso em projetos, algumas com ampla experiência documentada para emprego em dispositivos de tratamento de água, enquanto outras apenas como sugestão de uso futuro. Além disso, foram pesquisadas informações complementares relativas às espécies para possibilitar seu emprego em projetos paisagísticos, como: indicação de uso; família botânica; forma de vida; substrato; origem; endemismo e distribuição nos diferentes Domínios Fitogeográficos brasileiros (Amazônia, Caatinga, Cerrado, Mata Atlântica, Pampa e Pantanal). Este material foi organizado em forma de uma base de dados, que poderá servir de subsidio para a elaboração e consolidação de um repertório vegetal a ser utilizado em projetos paisagísticos desta natureza no Brasil
... Soon after its creation in 2004, GS received major criticism (see Orduna-Malea et al. [1] review), but subsequently, further studies described it more positively [2,3]. Indeed, the literature acknowledges the free access offered by GS [3][4][5] and the quality of its coverage [6][7][8][9][10][11][12]. The coverage of GS is considered better than that of both Web of Science (WoS) [12][13][14][15] and Scopus [9,10], which are GS's fee-based competitors. ...
Article
Google Scholar (GS) is a free tool that may be used by researchers to analyze citations, to find appropriate literature or to evaluate the quality of an author or a contender for tenure, promotion, a faculty position, funding or research grants. GS has become a major bibliographic and citation database. Following the literature, databases such as PubMed, PsycINFO, Scopus or Web of Science can be used in place of GS because they are more reliable. The aim of this study is to examine the accuracy of citation data collected from GS and provide a comprehensive description of the errors and miscounts identified. For this purpose, 281 documents that cited two specific works were retrieved via the Publish or Perish (PoP) software and examined. This work studied the false positive issue inherent in the analysis of neuroimaging data. The results reveal an unprecedented error rate: 279 of 281 the examined references (99.3%) contain at least one error. The nonacademic documents tend to contain more errors than the academic publications (U=5117.0, P<.001). Full text: https://www.jmir.org/2022/5/e28354
... Google Scholar was used as a search engine due to its academic structure. It encompasses more scientific and scholarly literature than Web of Science (WoS) and SCOPUS as these are more selective in terms of which journals are included in their databases (Orduna-Malea et al., 2015). Moreover, Google Scholar has expanded its coverage over time which makes it a powerful database for this kind of research. ...
Article
Full-text available
Universities in Asia have observed a conspicuously large-scale adoption of English Medium Instruction (EMI), particularly in China and Malaysia. With the requirement that content teachers conduct classes in Chinese and Malaysian universities, content teachers' identities would be reshaped into that of EMI teachers. However, the nature of policy directives concerning EMI, code-switching to mother tongue in the classrooms, the dearth of professional development programs, the intervention of local materials and instructional deficiency of the teachers, and nationalism have complicated the development of content teachers' identity as EMI teachers. The current study aimed at understanding the identities of university teachers involved in EMI courses in Malaysian and Chinese universities. Undertaking a systematic review of the published studies concerning the EMI phenomenon in Malaysian and Chinese universities, and discussing the findings in the light of the dialogical approach proposed by Akkerman and Meijer (2011), the study reported that tensions between teachers' identity as content teachers on one hand and as EMI teachers on other hand, offered multiple-positions (I-positions) that teachers need to interpret and evaluate through dialogues. The dialogues have continued among several I-positions. Eventually, the coherent identity of the EMI teachers developed as bilingual teachers, which has been incongruent with the policy directives. The implication of this study would help the stakeholders rethink bilingual higher education where English along with the national language would prevail to pave the way for knowledge acquisition. Moreover, the study set ground for future researchers to consider such an area for empirical exploration since teacher identity in conjunction with EMI has been relatively less explored. Resumen Las universidades de Asia han adoptado a gran escala el uso del inglés como medio de instrucción, inglés, particularmente en China y Malasia. Con la estipulación dirigida a los profesores de contenido para impartir clases en universidades chinas y malasias, la identidad de los profesores de contenido se moldearía como profesores de EMI. Sin embargo, encontramos que la naturaleza de las directivas de política con respecto a EMI, el cambio a la lengua materna en las aulas, la escasez de programas de desarrollo profesional, la intervención de materiales locales y la deficiencia en la instrucción de los maestros, y el nacionalismo producen dificultades para desarrollar el contenido. Sobre la base del "enfoque dialógico'' sugerido por Akkerman y Meijer (2011), el estudio profundizó en que las tensiones entre la identidad de los docentes de contenido por un lado y los docentes de EMI por otro, engendran múltiples entidades que los docentes necesitan interpretar y evaluar a través de diálogos continuos entre varias posiciones. Finalmente, la identidad coherente de los profesores de EMI se desarrolló como profesores bilingües, lo que ha sido incongruente con las directivas de la política.
... Google Scholar, which started in 2004, is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. The database index includes most peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents ( Orduña-Malea et al., 2015 ). According to a publication in Scientometric, Google Scholar is estimated to contain roughly 389 million documents including articles, citations, and patents making it the world's largest academic search engine in January 2018 ( Gusenbauer, 2019 ) Table 3 . ...
Article
Full-text available
Anthropogenic activities (population, economic growth and energy consumption) driving climate change are increasing rapidly and reinforcing global greenhouse gas emissions in Africa. For energy consumption, the most significant aspects are electricity, heat, and transportation. Existing studies on decarbonisation process in the electricity generation industry in Africa are reviewed in this work. Systematic review guided the systematic identification of literature related to deep decarbonisation process as applied to electricity generation. The studies were reviewed in the context of decarbonisation as simply being the process of reducing or eliminating CO2 from energy sources in electricity generation. Two academic search engines and bibliographic databases, were engaged. Twenty-one papers from both databases were analysed and reported under broad themes of technology and political economy. The study reveals the status of power sector decarbonisation in Africa, showing very limited efforts have been made as regards energy transition process, with absence of decarbonisation in any of the countries’ power-pool. Also, successful decarbonisation of the power sector in sub Saharan African countries would require require alignment of the political and economic forces in that of a productive manner. The study recommends further efforts be made on the use of Agriculture, Forestry, and Land Use sectors in decarbonization process in Africa, and a critical alignment of its policies for climate change reduction efforts.
Article
Google Scholar has become an important search platform for students in higher education, and, as such, can be regarded as a competitor to university libraries. Previous research has explored students’ intention to use Google Scholar (GS) and University Digital Libraries (UDLs), but there is a lack of comparative studies that explore students’ preferences between these two platforms. Therefore, this study seeks to explore the search behaviour of a select group of users, international postgraduate students and more specifically compares the factors that influence their use of Google Scholar and University Digital Libraries (UDLs). A questionnaire-based survey, based on the factors in the UTAUT model (unified theory of acceptance and use of technology) was conducted to collect data on acceptance and use of technology of GS and UDL’s respectively. Data was collected from 400 international postgraduate students studying in the United Kingdom. Confirmatory factor analysis was used to establish the contextual influencing factors, whilst structural equation modelling examined the predicted model. The results suggest some differences between the influence of various factors between the UDL dataset and the GS datasets. They suggest that social influence (SI) did not affect behavioural intention (BI) for either data set, but that for the UDL dataset, effort expectancy did not affect BI, whereas for the GS dataset facilitating conditions did not influence BI. The approach taken in this study further facilitates research into the use of search tools to progress beyond ease of use as a main driver and to explore the relationship between internal and external influences of use. Recommendations for further research are suggested and the value of the insights gained for UDLs and their provision and support for all students is discussed.
Article
Full-text available
Se realiza una descripción de las colecciones, productos e indicadores bibliométricos de Web of Science, con especial énfasis en su utilidad e importancia en actividades de evaluación científica. Se exponen además sus principales limitaciones de cobertura y de indicadores, que impactan en el análisis de la producción científica en países y/o regiones periféricas y en áreas del conocimiento con menor representación en la fuente. También se comentan los aportes específicos de la base de datos en las distintas actividades y fases de la investigación científica como investigadores, revistas, grupos editoriales y bibliotecas. En concreto, se muestra el volumen de datos, se detallan sus colecciones, productos e indicadores, junto a la valoración de algunos aspectos en lo positivo y negativo. Se realizan comparativos con otras fuentes de información existentes en el mercado de la investigación científica y que igualmente permiten la realización de investigación bibliométrica, brindando al lector una importante caracterización de la herramienta y sus competidores, que ayude a conocer sus perspectivas de uso dentro del escenario de investigación. Las ideas desarrolladas y sistematizadas en el texto, permiten concluir que, a pesar de la relevancia de la misma para la actividad científica en distintos niveles y agregados, los sesgos de sus indicadores, la imposibilidad de acceso a la fuente en muchas instituciones y la existencia de otras herramientas con similares prestaciones y facilidades de uso, son aspectos que se deben tener en cuenta porque inciden en su aplicación, uso futuro y permanencia en el ecosistema de investigación.
Article
Full-text available
Purpose – The purpose of this paper is to describe the obsolescence process of Microsoft Academic Search (MAS) as well as the effects of this decline in the coverage of disciplines and journals, and their influence in the representativeness of organizations. Design/methodology/approach – The total number of records and those belonging to the most reputable journals (1,762) and organizations (346) according to the Field Rating indicator in each of the 15 fields and 204 sub-fields of MAS, have been collected and statistically analysed in March 2014, by means of an automated querying process via http, covering academic publications from 1700 to present. Findings – MAS has no longer been updated since 2013, although this phenomenon began to be glimpsed in 2011, when its coverage plummeted. Throughout 2014, indexing of new records is still ongoing, but at a minimum rate, without following any apparent pattern. Research limitations/implications – There are also retrospective records being indexed at present. In this sense, this research provides a picture of what MAS offered during March 2014 being queried directly via http. Practical implications – The unnoticed obsolescence of MAS affects to the quality of the service offered to its users (both those who engage in scientific information seeking and also those who use it for quantitative purposes). Social implications – The predominance of Google Scholar (GS) as monopoly in the academic search engines market as well as the prevalence of an open construction model (GS) vs a closed model (MAS). Originality/value – A complete longitudinal analysis of disciplines, journals and organizations on MAS has been performed for the first time identifying an unnoticed obsolescence. Any public explanation or disclaimer note has been announced from the responsible company, something incomprehensible given its implications for the reliability and validity of bibliometric data provided on disciplines, journals, authors and congress as well as their fair representation on the academic search engine.
Working Paper
Full-text available
The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below: 1. Which are the most cited documents in GS? 2. Which are the most cited document types in GS? 3. What languages are the most cited documents written in GS? 4. How many highly cited documents are freely accessible? 4.1 What file types are the most commonly used to store these highly cited documents? 4.2 Which are the main providers of these documents? 5. How many of the highly cited documents indexed by GS are also indexed by WoS? 6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS? 7. How many versions of these highly cited documents has GS detected? 8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received? 9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages? 10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?
Article
Full-text available
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
Article
Full-text available
Despite its increasing role in communication, the world wide web remains the least controlled medium: any individual or institution can create websites with unrestricted number of documents and links. While great efforts are made to map and characterize the Internet's infrastructure, little is known about the topology of the web. Here we take a first step to fill this gap: we use local connectivity measurements to construct a topological model of the world wide web, allowing us to explore and characterize its large scale properties. Comment: 5 pages, 1 figure, updated with most recent results on the size of the www
Article
American universities today serve as economic engines, performing the scientific research that will create new industries, drive economic growth, and keep the United States globally competitive. But only a few decades ago, these same universities self-consciously held themselves apart from the world of commerce. Creating the Market University is the first book to systematically examine why academic science made such a dramatic move toward the market. Drawing on extensive historical research, Elizabeth Popp Berman shows how the government--influenced by the argument that innovation drives the economy--brought about this transformation. Americans have a long tradition of making heroes out of their inventors. But before the 1960s and '70s neither policymakers nor economists paid much attention to the critical economic role played by innovation. However, during the late 1970s, a confluence of events--industry concern with the perceived deterioration of innovation in the United States, a growing body of economic research on innovation's importance, and the stagnation of the larger economy--led to a broad political interest in fostering invention. The policy decisions shaped by this change were diverse, influencing arenas from patents and taxes to pensions and science policy, and encouraged practices that would focus specifically on the economic value of academic science. By the early 1980s, universities were nurturing the rapid growth of areas such as biotech entrepreneurship, patenting, and university-industry research centers. Contributing to debates about the relationship between universities, government, and industry, Creating the Market University sheds light on how knowledge and politics intersect to structure the economy.
Book
Academic Search Engines: intends to run through the current panorama of the academic search engines through a quantitative approach that analyses the reliability and consistence of these services. The objective is to describe the main characteristics of these engines, to highlight their advantages and drawbacks, and to discuss the implications of these new products in the future of scientific communication and their impact on the research measurement and evaluation. In short, Academic Search Engines presents a summary view of the new challenges that the Web set to the scientific activity through the most novel and innovative searching services available on the Web.
Article
Giant academic social networks have taken off to a degree that no one expected even a few years ago. A Nature survey explores why.
Article
Bing and Google customize their results to target people with different geographic locations and languages but, despite the importance of search engines for web users and webometric research, the extent and nature of these differences are unknown. This study compares the results of seventeen random queries submitted automatically to Bing for thirteen different English geographic search markets at monthly intervals. Search market choice alters a small majority of the top 10 results but less than a third of the complete sets of results. Variation in the top 10 results over a month was about the same as variation between search markets but variation over time was greater for the complete results sets. Most worryingly for users, there were almost no ubiquitous authoritative results: only one URL was always returned in the top 10 for all search markets and points in time, and Wikipedia was almost completely absent from the most common top 10 results. Most importantly for webometrics, results from at least three different search markets should be combined to give more reliable and comprehensive results, even for queries that return fewer than the maximum number of URLs.