Chapter

Google Scholar's Filter Bubble: An Inflated Actuality?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This chapter investigates the allegation that popular online search engine Google applies algorithms to personalise search results therefore yielding different results for the exact same search terms. It specifically examines whether the same alleged filter bubble applies to Google's academic product: Google Scholar. It reports the results from an exploratory experiment of nine keywords carried out for this purpose, varying variables such as disciplines (Natural Science, Social Science and Humanities), geographic locations (north/south), and levels (senior/junior researchers). It also reports a short survey on academic search behaviour. The finding suggests that while Google Scholar, together with Google, has emerged as THE dominant search engine among the participants of this study, the alleged filter bubble is only mildly observable. The Jaccard similarity of search results for all nine keywords is strikingly high, with only one keyword that exhibits a localized bubble at 95% level. This chapter therefore concludes that the filter bubble phenomenon does not warrant concern.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Purpose The purpose of this paper is to compare the content of Web of Science (WoS) and Google Scholar (GS) by searching the interdisciplinary field of climate and ancient societies. The authors aim at analyzing the retrieved documents by open availability, received citations, co-authors and type of publication. Design/methodology/approach The authors searched the services by a defined set of keyword. Data were retrieved and analyzed using a variety of bibliometric tools such as Publish or Perish, Sci2Tool and Gephi. In order to determine the proportion of open full texts based on the WoS result, the authors relocated the records in GS, using an off-campus internet connection. Findings The authors found that the top 1,000 downloadable and analyzable GS items matched poorly with the items retrieved by WoS. Based on this approach (subject searching), the services appeared complementary rather than similar. Even though the first search results differ considerably by service, almost each single WoS title could be located in GS. Based on GS’s full text recognition, the authors found 74 percent of WoS items openly available and the citation median of these was twice as high as for documents behind paywalls. Research limitations/implications Even though the study is a case study, the authors believe that findings are transferable to other interdisciplinary fields. The share of freely available documents, however, may depend on the investigated field and its culture toward open publishing. Practical implications Discovering the literature of interdisciplinary fields puts scholars in a challenging situation and requires a better understanding of the existing infrastructures. The authors hope that the paper contributes to that and can advise the research and library communities. Originality/value In light of an overwhelming and exponentially growing amount of literature, the bibliometric approach is new in a library context.
Full-text available
Article
Google Scholar was released as a beta product in November of 2004. Since then, Google Scholar has been scrutinized and questioned by many in academia and the library field. Our objectives in undertaking this study were to determine how scholarly Google Scholar is in comparison with traditional library resources and to determine if the scholarliness of materials found in Google Scholar varies across disciplines. We found that Google Scholar is, on average, 17.6 percent more scholarly than materials found only in library databases and that there is no statistically significant difference between the scholarliness of materials found in Google Scholar across disciplines.
Full-text available
Article
Online information intermediaries such as Facebook and Google are slowly replacing traditional media channels thereby partly becoming the gatekeepers of our society. To deal with the growing amount of information on the social web and the burden it brings on the average user, these gatekeepers recently started to introduce personalization features, algorithms that filter information per individual. In this paper we show that these online services that filter information are not merely algorithms. Humans not only affect the design of the algorithms, but they also can manually influence the filtering process even when the algorithm is operational. We further analyze filtering processes in detail, show how personalization connects to other filtering techniques, and show that both human and technical biases are present in today’s emergent gatekeepers. We use the existing literature on gatekeeping and search engine bias and provide a model of algorithmic gatekeeping.
Full-text available
Article
This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological s. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23 expert searches constructed by demography librarians at three top universities. For simple searches, Google Scholar’s recall and precision are well above average. For expert searches, the relative effectiveness of GS depends on the number of results users are willing to examine. Although Google Scholar’s expert-search performance is just average within the first fifty search results, GS is one of the few databases that retrieves relevant results with reasonably high precision after the fiftieth hit. The results also show that simple searches in GS, GEOBASE, PubMed, and Sociological s have consistently higher recall and precision than expert searches. This can be attributed not to differences in expert-search effectiveness, but to the unusually strong performance of simple searches in those four databases.
Full-text available
Article
Google Scholar was released as a beta product in November of 2004. Since then, Google Scholar has been scrutinized and questioned by many in academia and the library field. Our objectives in undertaking this study were to determine how scholarly Google Scholar is in comparison with traditional library resources and to determine if the scholarliness of materials found in Google Scholar varies across disciplines. We found that Google Scholar is, on average, 17.6 percent more scholarly than materials found only in library databases and that there is no statistically significant difference between the scholarliness of materials found in Google Scholar across disciplines.
Full-text available
Article
The scope of the article is to give a literature review over comparison of the two services. To obtain insight into Google Scholar, it is tested against Web of Science (WoS), the most recognized proprietary database for peer reviewed journal content. Both databases are multidisciplinary, provide links to library holdings and offer opportunities for export of references. In addition they have the powerful feature of tracking citing items. Comparisons are based on database content, recall and research impact measures. The article touches library teaching issues at higher education institutions, and argues for which reasons Google Scholar along with WoS is worthwhile to be included in the library programs for information literacy teaching. Google Scholar is popular among faculty staff and students, but has been met with scepticism by library professionals and therefore not yet established as subject for teaching.
Full-text available
Article
Web search engines apply a variety of ranking signals to achieve user satisfaction, i.e., results pages that provide the best-possible results to the user. While these ranking signals implicitly consider credibility (e.g., by measuring popularity), explicit measures of credibility are not applied. In this chapter, credibility in Web search engines is discussed in a broad context: credibility as a measure for including documents in a search engine's index, credibility as a ranking signal, credibility in the context of universal search results, and the possibility of using credibility as an explicit measure for ranking purposes. It is found that while search engines-at least to a certain extent-show credible results to their users, there is no fully integrated credibility framework for Web search engines.
Full-text available
Article
This paper reports on an exploratory study of how university students perceive and interact with Web search engines compared to Web-based OPACs. A qualitative study was conducted involving sixteen students, eight of whom were first-year undergraduates and eight of whom were graduate students in Library and Information Science. The participants performed searches on Google and on a university OPAC. The interviews and think-afters revealed that while students were aware of the problems inherent in Web searching and of the many ways in which OPACS are more organized, they generally preferred Web searching. The coding of the data suggests that the reason for this preference lies in psychological factors associated with the comparative ease with which search engines can be used, and system and interface factors which made searching the Web much easier and less confusing. As a result of these factors, students were able to approach even the drawbacks of the Web—its clutter of irrelevant pages and the dubious authority of the results—in an enthusiastic and proactive manner, very different from the passive and ineffectual admiration they expressed for the OPAC. The findings suggest that requirements of good OPAC interface design must be aggressively redefined in the face of new, Web-based standards of usability.
Full-text available
Article
The Institute for Scientific Information's (ISI) citation databases have been used for decades as a starting point and often as the only tools for locating citations and/or conducting citation analyses. ISI databases (or Web of Science [WoS]), however, may no longer be sufficient because new databases and tools that allow citation searching are now available. Using citations to the work of 25 library and information science faculty members as a case study, this paper examines the effects of using Scopus and Google Scholar (GS) on the citation counts and rankings of scholars as measured by WoS. Overall, more than 10,000 citing and purportedly citing documents were examined. Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals. The use of Scopus and GS, in addition to WoS, helps reveal a more accurate and comprehensive picture of the scholarly impact of authors. WoS data took about 100 hours of collecting and processing time, Scopus consumed 200 hours, and GS a grueling 3,000 hours.
Full-text available
Article
In order to measure the degree to which Google Scholar can compete with bibliographical databases, search results from this database is compared with Thomson’s ISI WoS (Institute for Scientific Information, Web of Science). For earth science literature 85% of documents indexed by ISI WoS were recalled by Google Scholar. The rank of records displayed in Google Scholar and ISI WoS, is compared by means of Spearman’s footrule. For impact measures the h-index is investigated. Similarities in measures were significant for the two sources.
Full-text available
Article
Sharing research resources of different kinds, in new ways, and on an increasing scale, is a central element of the unfolding e-Research vision. Web 2.0 is seen as providing the technical platform to enable these new forms of scholarly communications. We report findings from a study of the use of Web 2.0 services by UK researchers and their use in novel forms of scholarly communication. We document the contours of adoption, the barriers and enablers, and the dynamics of innovation in Web services and scholarly practices. We conclude by considering the steps that different stakeholders might take to encourage greater experimentation and uptake.
Article
Our goal in this chapter is to draw on empirical work about preference formation and welfare to propose a distinctive form of paternalism, libertarian in spirit, one that should be acceptable to those who are firmly committed to freedom of choice on grounds of either autonomy or welfare. Indeed, we urge that a kind of ‘libertarian paternalism’ provides a basis for both understanding and rethinking many social practices, including those that deal with worker welfare, consumer protection, and the family. In the process of defending these claims, we intend to make some objections to widely held beliefs about both freedom of choice and paternalism. Our major emphasis is on the fact that in many domains, people lack clear, stable, or well-ordered preferences. What they choose is strongly influenced by details of the context in which they make their choice, for example default rules, framing effects (that is, the wording of possible options), and starting points. These contextual influences render the very meaning of the term ‘preferences’ unclear. If social planners are asked to respect preferences, or if they are told that respect for preferences promotes well-being, they will often be unable to know what they should do. Consider the question whether to undergo a risky medical procedure. When people are told, ‘Of those who undergo this procedure, 90 percent are still alive after five years,’ they are far more likely to agree to the procedure than when they are told, ‘Of those who undergo this procedure, 10 percent are dead after five years’ (Redelmeier, Rozin, & Kahneman, 1993, p. 73).
Article
Source: Democracy Now! JUAN GONZALEZ: When you follow your friends on Facebook or run a search on Google, what information comes up, and what gets left out? That's the subject of a new book by Eli Pariser called The Filter Bubble: What the Internet Is Hiding from You. According to Pariser, the internet is increasingly becoming an echo chamber in which websites tailor information according to the preferences they detect in each viewer. Yahoo! News tracks which articles we read. Zappos registers the type of shoes we wear, we prefer. And Netflix stores data on each movie we select. AMY GOODMAN: The top 50 websites collect an average of 64 bits of personal information each time we visit and then custom-designs their sites to conform to our perceived preferences. While these websites profit from tailoring their advertisements to specific visitors, users pay a big price for living in an information bubble outside of their control. Instead of gaining wide exposure to diverse information, we're subjected to narrow online filters. Eli Pariser is the author of The Filter Bubble: What the Internet Is Hiding from You. He is also the board president and former executive director of the group MoveOn.org. Eli joins us in the New York studio right now after a whirlwind tour through the United States.
Article
In 2011, researchers at Bucknell University and Illinois Wesleyan University compared the search efficacy of Serial Solutions Summon, EBSCO Discovery Service, Google Scholar and conventional library databases. Using a mixed-methods approach, qualitative and quantitative data was gathered on students' usage of these tools. Regardless of the search system, students exhibited a marked inability to effectively evaluate sources and a heavy reliance on default search settings. On the quantitative benchmarks measured by this study, the EBSCO Discovery Service tool outperformed the other search systems in almost every category. This article describes these results and makes recommendations for libraries considering these tools.
Article
Web-scale discovery has arrived. With products like Summon and WorldCat Local, hundreds of millions of articles and books are accessible at lightning speed from a single search box via the library. But there's a catch. As the size of the index grows, so too does the challenge of relevancy. When Google launched in 1998 with an index of only 25 million pages, its patented PageRank algorithm was powerful enough to provide outstanding results. But the web has grown to well over a trillion pages, and Google now employs over 200 different signals to determine what search results you see. According to Eli Pariser, author of "The filter bubble: what the internet is hiding from you" (Penguin, 2011), a growing number of these signals are based on what Google knows about you, especially your web history; and, according to Pariser, serving up information that's "pleasant and familiar and confirms your beliefs" is becoming increasingly synonymous with relevancy. This session will critique Pariser's concept of the 'filter bubble' in terms of collection development and the possible evolutions of discovery layers like Summon and WorldCat Local, and the challenge of providing relevant academic research results in a web-scale world where students increasingly expect the kind of personalization sometimes at odds with academia's adherence to privacy and intellectual freedom.
Article
Purpose The purpose of the study was to compare an internet search engine, Google, with appropriate library databases and systems, in order to assess the relative value, strengths and weaknesses of the two sorts of system. Design/methodology/approach A case study approach was used, with detailed analysis and failure checking of results. The performance of the two systems was assessed in terms of coverage, unique records, precision, and quality and accessibility of results. A novel form of relevance assessment, based on the work of Saracevic and others was devised. Findings Google is superior for coverage and accessibility. Library systems are superior for quality of results. Precision is similar for both systems. Good coverage requires use of both, as both have many unique items. Improving the skills of the searcher is likely to give better results from the library systems, but not from Google. Research limitations/implications Only four case studies were included. These were limited to the kind of queries likely to be searched by university students. Library resources were limited to those in two UK academic libraries. Only the basic Google web search functionality was used, and only the top ten records examined. Practical implications The results offer guidance for those providing support and training for use of these retrieval systems, and also provide evidence for debates on the “Google phenomenon”. Originality/value This is one of the few studies which provide evidence on the relative performance of internet search engines and library databases, and the only one to conduct such in‐depth case studies. The method for the assessment of relevance is novel.
Article
Our goal in this chapter is to draw on empirical work about preference formation and welfare to propose a distinctive form of paternalism, libertarian in spirit, one that should be acceptable to those who are firmly committed to freedom of choice on grounds of either autonomy or welfare. Indeed, we urge that a kind of ‘libertarian paternalism’ provides a basis for both understanding and rethinking many social practices, including those that deal with worker welfare, consumer protection, and the family.
Article
This article analyzes the concept of the Invisible Web and its implication for academic librarianship. It offers a guide to tools that can be used to mine the Invisible Web and discusses the benefits of using the Invisible Web to promote interest in library services. In addition, the article includes an expanded definition, a literature review, and suggestions for ways in which to incorporate the Invisible Web in reference work and library promotion.
Article
Various feautures of innovative internet search engines that deliver customized results are discussed. Mooter, a new search engine, simplifies the user's assessment of results by categorizing the collected information and clustering related sites under on-screen buttons. Kartoo, a metasearch engine, submits the user's query to other search engines and provides aggregated results in a visual form. The next search engines are expected to improve results by digging deeper through online materials and by monitoring users' interests to respond more intelligently to future searches.
Article
Google Scholar has been met with both enthusiasm and criticism since its introduction in 2004. This search engine provides a simple way to access “peer-reviewed papers, theses, books, abstracts, and articles from academic publishers' sites, professional societies, preprint repositories, universities and other scholarly organizations” [1]. An obvious strength of Google Scholar is its intuitive interface, as the main search engine interface consists of a simple query box. In contrast, databases, such as PubMed, utilize search interfaces that offer a greater variety of advanced features. These additional features, while powerful, often lead to a complexity that may require a substantial investment of time to master. It has been observed that Google Scholar may allow searchers to “find some resources they can use rather than be frustrated by a database's search screen” [2]. Some even feel that “Google Scholar's simplicity may eventually consume PubMed” [3]. Along with ease of use, Google Scholar carries the familiar “Google” brand name. As Kennedy and Price so aptly stated, “College students AND professors might not know that library databases exist, but they sure know Google” [4]. The familiarity of Google may allow librarians and educators to ease students into the scholarly searching process by starting with Google Scholar and eventually moving to more complex systems. Felter noted that “as researchers work with Google Scholar and reach limitations of searching capabilities and options, they may become more receptive to other products” [5]. Google Scholar is also thought to provide increased access to gray literature [2], as it retrieves more than journal articles and includes preprint archives, conference proceedings, and institutional repositories [6]. Google Scholar also includes links to the online collections of some academic libraries. Including these access points in Google Scholar retrieval sets may ultimately help more users reach more of their own institution's subscriptions [7]. While its advantages are substantial, Google Scholar is not without flaws. The shortcomings of the system and its search interface have been well documented in the literature and include lack of reliable advanced search functions, lack of controlled vocabulary, and issues regarding scope of coverage and currency. Table 1 summarizes some of the reported criticisms of Google Scholar. Table 1 Criticisms of Google Scholar Vine found that while Google Scholar pulls in data from PubMed, many PubMed records are missing [20], and that Google Scholar also lacks features available in MEDLINE [12]. Others have noted that Google Scholar should not be the first or sole choice when searching for patient care information, clinical trials, or literature reviews [23,24]. Thorough review and testing of Google Scholar, being an approach similar to that used to evaluate licensed resources, is necessary to better understand its strengths and limitations. As Jacso states, “professional searchers must do sample test searches and correctly interpret the results to corroborate claims and get factual information about databases” [18]. This paper compares and contrasts a variety of test searches in PubMed and Google Scholar to gain a better understanding of Google Scholar's searching capabilities.
Filter Bubble and Enframing: On the Self-Affirming Dynamics of Technologies
  • A Beinsteiner
Beinsteiner, A. (2013). Filter Bubble and Enframing: On the Self-Affirming Dynamics of Technologies. Retrieved April 13, 2016, from http://ceur-ws.org/Vol-859/paper3.pdf
Your results may vary: will the information superhighway turn into a cul-de-sac because of automated filters?
  • P Boutin
Boutin, P. (2011, May 20). Your results may vary: will the information superhighway turn into a culde-sac because of automated filters? Retrieved April 13, 2016, from http://www.wsj.com/articles/SB1
ComScore Releases Desktop Search Engine Ranking
  • Comscore
ComScore. (2015). ComScore Releases March 2015: U.S. Desktop Search Engine Ranking. Retrieved April 13, 2016, from https://www.comscore.com/lat/Insights/Market-Rankings/comScore-ReleasesMarch-2015-US-Desktop-Search-Engine-Rankings
My Google search results are different to yours
  • R Garavaglia
Garavaglia, R. (2011). My Google search results are different to yours. Retrieved April 13, 2016, from http://bonzamobilecomputerrepairs.com/blog/?p=4657
16 differences between Google mobile & desktop search results in 2012
  • B Meunier
Meunier, B. (2012). 16 differences between Google mobile & desktop search results in 2012. Retrieved April 13, 2016, from http://searchengineland.com/16-differences-between-google-mobile-desktopsearch-results-in-2012-130463
Search engine use report
  • Pew
Pew. (2012). Search engine use report. Retrieved April 13, 2016, from http://www.pewinternet.org/files/ old-media//Files/Reports/2012/PIP_Search_Engine_Use_2012.pdf
Publish or perish? The rise of the fractional author
  • A Plume
  • D Van Weijen
Plume, A., & van Weijen, D. (2014). Publish or perish? The rise of the fractional author. Retrieved April 13, 2016, from http://www.researchtrends.com/issue-38-september-2014/publish-or-perish-therise-of-the-fractional-author/
Adoption and use of Web 2.0 in scholarly communications
Adoption and use of Web 2.0 in scholarly communications. Philosophical Transactions of the Royal Society A, 368(1926), 4039-4056.
Reasons your Google search results are different than mine
  • S Snipes
Snipes, S. (2012). Reasons your Google search results are different than mine. Retrieved April 13, 2016, from http://themetaq.com/articles/reasons-your-google-search-results-are-different-than-mine
Bubble trouble: is web personalisation Turing us into solipsistic twist
  • J Weisberg
Weisberg, J. (2011). Bubble trouble: is web personalisation Turing us into solipsistic twist. Retrieved April 13, 2016, from http://www.slate.com/articles/news_and_politics/the_big_idea/2011/06/bubble_trouble. html
Desktop Search Engine Ranking
  • Comscore
ComScore. (2015). ComScore Releases March 2015: U.S. Desktop Search Engine Ranking. Retrieved April 13, 2016, from https://www.comscore.com/lat/Insights/Market-Rankings/comScore-Releases-March-2015-US-Desktop-Search-Engine-Rankings
I still like Google: University student perceptions of searching OPACs and the web
  • K V Fast
  • D G Campell
Fast, K. V., & Campell, D. G. (2004). I still like Google: University student perceptions of searching OPACs and the web. Proceedings of the American Society for Information Science and Technology, 41(1), 138-146. doi:10.1002/meet.1450410116
Bringing smarter computing to big data
  • Ibm
IBM. (2011). Bringing smarter computing to big data. Retrieved April 13, 2016, from https://www.ibm. com/smarterplanet/global/files/us__en_us__smarter_computing__ibm_data_final.pdf
Preferences, paternalism and liberty. Royal Institute of Philosophy Supplements
  • C R Sunstein
Sunstein, C. R. (2006). Preferences, paternalism and liberty. Royal Institute of Philosophy Supplements, 59, 233-264. doi:10.1017/S135824610605911X
Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases. portal
  • J Titsworth
Titsworth, J. (2010). Different Search Results for the Same Term on Google. Retrieved April 13, 2016, from http://www.searchenginejournal.com/different-search-results-for-the-same-term-on-google/18905/ Walters, W. H. (2011). Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases. portal. Libraries and the Academy, 11(4), 971-1006. doi:10.1353/ pla.2011.0042
Science, sort of: Google experiment disproves “confirmation bias
  • A Zwissler
Zwissler, A. (2011). Science, sort of: Google experiment disproves "confirmation bias". Retrieved April 13, 2016, from http://www.contracostatimes.com/science/ci_18677465?nclick_check=1
Different Search Results for the Same Term on Google
  • J Titsworth