Conference Paper

Where does Google find API documentation?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The documentation of popular APIs is spread across many formats, from vendor-curated reference documentation to Stack Overflow threads. For developers, it is often not obvious from where a particular piece of information can be retrieved. To understand this documentation landscape, we systematically conducted Google searches for the elements of ten popular APIs. We found that their documentation is widely dispersed among many sources, that GitHub and Stack Overflow play a prominent role among the search results, and that most sources are quick to document new API functionalities.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The Google search engine indexes millions of webpages that include code examples [90]. Naturally, pages with better content are likely to be top ranked, grabbing more attention and click from the users [23, 35,90]. ...
... The Google search engine indexes millions of webpages that include code examples [90]. Naturally, pages with better content are likely to be top ranked, grabbing more attention and click from the users [23, 35,90]. In practice, many factors may influence the rank: page reputation, page domain, content quality, to name a few [20,23,29,45]. ...
... We look for code examples on the web by querying on Google (on a private browser session) the API full name followed by the word "example". The word "example" is used to maximize the chance we find proper API examples, and not API documentation [90]. For instance, for the Guava API Ints.concat, we query for "com.google.common.collect.Ints.concat ...
Article
Full-text available
Developers often look for code examples on the web to improve learning and accelerate development. Google indexes millions of pages with code examples: pages with better content are likely to be top ranked. In practice, many factors may influence the rank: page reputation, content quality, etc. Consequently , the most relevant information on the page, i.e., the code example, may be overshadowed by the search engine. Thus, a better understanding of how Google would rank code examples in isolation may provide the basis to detect its strengths and limitations on dealing with such content. In this paper, we assess how the Google search engine ranks code examples. We build a website with 1,000 examples and submit it to Google. After being fully indexed, we query and analyze the returned examples. We find that pages with multiple code examples are more likely to top ranked by Google. Overall, single code examples that are higher ranked are larger, however, they are not necessarily more readable and reusable. We predict top ranked examples with a good level of confidence, but generic factors have more importance than code quality ones. Based on our results, we provide insights for researchers and practitioners.
... Prior work reports that developers may spend up to 20% of their time on this task [6]- [8]. For this purpose, many websites are available [9]. Stack Overflow [10], for instance, receives over 50 million users per month and is the 44th most visited website in the world. ...
... Lastly, from the Google search results, we collect the top 10 returned links and analyze their source websites. Rationale: detecting software resources on the web informs software vendors where developers find information about their products, it notifies developers where they can spot software information, and it supports researchers to study the software resource landscape on the web [9], [46]. ...
... Also, tools that rely on web search (e.g., [24]- [27]) may be impacted by those omissions and thus should be aware to avoid noisy results for developers. Priority of Google to Stack Overflow (RQ5) It is fact that Stack Overflow plays an important role in software development nowadays [9], [47], [48]. Our findings reinforce this statement: we detect that Google finds software resources mostly on Stack Overflow (11%) with an over-concentration in the top 1 results (28%). ...
... Google Apps provided some services such as email, classroom, calendar, google drive and site services. In addition, Google Apps also provides Application Programming Interface (API) that can be used by application developers to access various services provided by Google (Treude & Aniche, 2018). API is a collection of commands/functions that can be utilized so that a system can interact with other systems (Lensakom, 2016). ...
... In the digital field, Google provides many services, including API provided by Google Framework. Currently, Google provides various APIs that can be used by users/application developers to be able to use Google services through third-party applications such as email services, Google Maps, Google Classroom, and Google Docs (Hairah & Budiman, 2017;Treude & Aniche, 2018). ...
Article
Full-text available
Abstract : Google Apps is a service provided by Google that allows users to use Google products with their own domain names. Among the products offered by Google Apps are email (Gmail), Docs (Google Drive), and Classroom services. In addition, Google Apps also provides Application Programming Interface (API) services that can be used by developers to take advantage of various features provided by Google. Universitas Ubudiyah Indonesia (UUI) is one of the universities that use Google Apps service for managing student emails. At present, UUI student email management through Google Apps is still not integrated with academic information system data. As a result, UUI must allocate special resources for managing student emails manually. Based on these problems, this study proposes an integration system for UUI student email management using the Google Apps API. This system is designed using PHP programming. The Google Apps API authentication method uses OAuth 2.0. The results of this study indicate that student email management on Google Apps can be done through campus academic information systems. With this system, students can activate email independently without having to be registered manually to the Google Apps page by the campus email managers. Abstrak : Google Apps adalah sebuah layanan yang disediakan oleh Google yang memungkinkan pengguna dapat menggunakan produk google dengan nama domain sendiri. Di antaranya produk yang disediakan Google Apps yaitu layanan email (Gmail), dokumen (Google Drive), dan Classroom. Selain itu, Google Apps juga menyediakan layanan Application Programming Interface (API) yang dapat dimanfaatkan oleh pengembang untuk memanfaatkan berbagai layanan yang disediakan oleh Google. Universitas Ubudiyah Indonesia (UUI) merupakan salah satu universitas yang menggunakan layanan Google Apps untuk pengelolaan email mahasiswa. Saat ini pengelolaan email mahasiswa UUI melalui Google Apps masih belum terintegrasi dengan data sistem informasi akademik. Akibatnya UUI harus mengalokasikan sumber daya khusus untuk mengelola email mahasiswa secara manual. Berdasarkan permasalahan tersebut penelitian ini mengusulkan sistem integrasi pengelolaan email mahasiswa UUI menggunakan API Google Apps. Sistem ini dirancang menggunakan pemograman PHP. Metode autentikasi API Google Apps menggunakan OAuth 2.0. Hasil penelitian ini menunjukkan pengelolaan email mahasiswa pada Google Apps dapat dilakukan melalui sistem informasi akademik kampus. Dengan adanya sistem ini mahasiswa dapat melakukan aktivasi email secara mandiri tanpa harus didaftarkan secara manual ke halaman Google Apps oleh pengelola email kampus.
... On GitHub's gist system, developers have shared over 300K public Python code snippets [3]. Both feature prominently in search results for API documentation [5], and a study by Yang et al. found over 4M code blocks from Stack Overflow snippets that had been reused in public GitHub projects [6]. Recently, code snippets in Jupyter notebooks have become a standard for sharing and replicating scientific work and more [4]. ...
... Some notebook files were completely empty, or identified as using Jupyter kernels for other languages, such as R, meaning V2 could not parse the notebook source code as Python. Other notebooks contained IPython magics, 5 which are defined as statements which are syntactically invalid Python, causing difficulty in parsing. ...
Preprint
Code snippets are prevalent, but are hard to reuse because they often lack an accompanying environment configuration. Most are not actively maintained, allowing for drift between the most recent possible configuration and the code snippet as the snippet becomes out-of-date over time. Recent work has identified the problem of validating and detecting out-of-date code snippets as the most important consideration for code reuse. However, determining if a snippet is correct, but simply out-of-date, is a non-trivial task. In the best case, breaking changes are well documented, allowing developers to manually determine when a code snippet contains an out-of-date API usage. In the worst case, determining if and when a breaking change was made requires an exhaustive search through previous dependency versions. We present V2, a strategy for determining if a code snippet is out-of-date by detecting discrete instances of configuration drift, where the snippet uses an API which has since undergone a breaking change. Each instance of configuration drift is classified by a failure encountered during validation and a configuration patch, consisting of dependency version changes, which fixes the underlying fault. V2 uses feedback-directed search to explore the possible configuration space for a code snippet, reducing the number of potential environment configurations that need to be validated. When run on a corpus of public Python snippets from prior research, V2 identifies 248 instances of configuration drift.
... Many papers concentrate on API documentation motivated by incomplete documentation [51], the challenge of producing good documentation [135], and the shift of API documentation to more social sources [134]. A case study with Github and Stack Overflow to locate information from 10 popular APIs found that Github and Stack Overflow are often used by Google to document new functionalities [176]. An empirical study that combines API patterns extracted from GitHub projects to determine if Stack Overflow posts present faulty API code, found that up to 31% of posts may have potential API violations [205]. ...
Article
Full-text available
Recent software advances have led to an expansion of the development and usage of application programming interfaces (APIs). From millions of Android packages (APKs) available on Google Store to millions of open-source packages available in Maven, PyPI, and npm, APIs have become an integral part of software development. Like any software artifact, software APIs evolve and suffer from this evolution. Prior research has uncovered many challenges to the development, usage, and evolution of APIs. While some challenges have been studied and solved, many remain. These challenges are scattered in the literature, which hides advances and cloaks the remaining challenges. In this systematic literature review on APIs and API evolution, we uncover and describe publication trends and trending topics. We compile common research goals, evaluation methods, metrics, and subjects. We summarize the current state-of-the-art and outline known existing challenges as well as new challenges uncovered during this review. We conclude that the main remaining challenges related to APIs and API evolution are (1) automatically identifying and leveraging factors that drive API changes, (2) creating and using uniform benchmarks for research evaluation, and (3) understanding the impact of API evolution on API developers and users with respect to various programming languages.
... The popularity of Stack Overflow can be attributed to two reasons. First, Stack Overflow is often placed prominently in the first page of the search results page when searching for documentation by Google (Treude and Aniche 2018), biasing users to click on this resource: "Whenever I search for any results in the domain software development, I tend to see more of [particular] resources, for example, Stackoverflow comes first." [P10] Second, Stack Overflow acts as a hub for programmers: "As a practice, I always tend to open the first search result and most of the time it happens to be from Stack Overflow, and as we know, Stack Overflow is the go-to place for us [programmers]." ...
Article
Full-text available
When learning a new technology, programmers often have to sift through multiple online resources to find information that addresses their questions. Prior work has reported that information seekers use a number of different strategies, including following scents, or indicators, to locate appropriate resources. We present a qualitative and quantitative investigation of how programmers learning a new technology employ these strategies to navigate between online resources and evaluate the pertinence of these resources. We performed a diary and interview study with ten programmers learning a new technology, to study how users navigate from the question they have to the resource that satisfies this need. Based on our observations, we propose a resource-seeking model that represents the online resource seeking behaviour of programmers when learning a new technology. The model is comprised of six components that can be divided into two groups: Need-oriented components, i.e. Questions, Preferences, and Beliefs, and Resource-oriented components, i.e. Resources, Cues, and Impression Factors. We identified nine relations between these components and studied how the components are associated. We report on the characteristics of the components and the relationships between them, and discuss the importance of search customization and other implications of our observations for resource creators and search tools.
... These queries receive the additional tokens "example in java". The "example" token aims to find code snippets instead of only explanations [19]. The token in java aims to find code snippets written in Java language. ...
Preprint
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing
... Furthermore, our inference procedure only requires a code snippet, and could easily be modified to work with another context, such as code snippets in Stack Overflow answers, blog posts [12], or online documentation [13]. Gistable focuses on configuring and running single file scripts. ...
Preprint
Software developers create and share code online to demonstrate programming language concepts and programming tasks. Code snippets can be a useful way to explain and demonstrate a programming concept, but may not always be directly executable. A code snippet can contain parse errors, or fail to execute if the environment contains unmet dependencies. This paper presents an empirical analysis of the executable status of Python code snippets shared through the GitHub gist system, and the ability of developers familiar with software configuration to correctly configure and run them. We find that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. Our study also suggests the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time. We also present Gistable, a database and extensible framework built on GitHub's gist system, which provides executable code snippets to enable reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error. Gistable is publicly available at https://github.com/gistable/gistable.
Conference Paper
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing.
Article
Full-text available
The purpose of this study is to perform a synthesis of API research. The study took stock of literature from academic journals on APIs with their associated themes, frameworks, methodologies, publication outlets and level of analysis. The authors draw on a total of 104 articles from academic journals and conferences published from 2010 to 2018. A systematic literature review was conducted on the selected articles. The findings suggest that API research is primarily atheoretical and largely focuses on the technological dimensions such as design and usage; thus, neglecting most of the social issues such as the business and managerial applications of APIs, which are equally important. Future research directions are provided concerning the gaps identified.
Conference Paper
Full-text available
Many developers rely on modern news aggregator sites such as Reddit and Hacker News to stay up to date with the latest technological developments and trends. In order to understand what motivates developers to contribute, what kind of content is shared, and how knowledge is shaped by the community, we interviewed and surveyed developers that participate on the Reddit programming subreddit and we analyzed a sample of posts on both Reddit and Hacker News. We learned what kind of content is shared in these websites and developer motivations for posting, sharing, discussing, evaluating, and aggregating knowledge on these aggregators, while revealing challenges developers face in terms of how content and participant behavior is moderated. Our insights aim to improve the practices developers follow when using news aggregators, as well as guide tool makers on how to improve their tools. Our findings are also relevant to researchers that study developer communities of practice.
Article
Full-text available
Formal documentation can be a crucial resource for learning to how to use an API. However, producing high-quality documentation can be nontrivial. Researchers investigated how 10 common documentation problems manifested themselves in practice. The results are based on two surveys of a total of 323 professional software developers and analysis of 179 API documentation units. The three severest problems were ambiguity, incompleteness, and incorrectness of content. The respondents often mentioned six of the 10 problems as 'blockers" that forced them to use another API.
Conference Paper
Full-text available
Why do software developers place so much effort into writing public blog posts about their knowledge, experiences, and opinions on software development? What are the benefits, problems, and tools needed-what can the research community do to help? In this paper, we describe a research agenda aimed at understanding the motivations and issues of software development blogging. We interviewed developers as well as mined and analyzed their blog posts. For this initial study, we selected developers from various backgrounds: IDE plugin development, mobile development, and web development. We found that developers used blogging for a variety of functions such as documentation, technology discussion, and announcing progress. They were motivated by a variety of reasons such as personal branding, knowledge retention, and feedback. Among the challenges for blog authors identified in our initial study, we found primitive tool support, difficulty recreating and recalling recent development experiences, and management of blog comments. Finally, many developers expressed that the motivations and benefits they received for blogging in public did not directly translate to corporate settings.
Conference Paper
Deprecation is a language feature that allows API producers to mark a feature as obsolete. We aim to gain a deep understanding of the needs of API producers and consumers alike regarding deprecation. To that end, we investigate why API producers deprecate features, whether they remove deprecated features, how they expect consumers to react, and what prompts an API consumer to react to deprecation. To achieve this goal we conduct semi-structured interviews with 17 third-party Java API producers and survey 170 Java developers. We observe that the current deprecation mechanism in Java and the proposal to enhance it does not address all the needs of a developer. This leads us to propose and evaluate three further enhancements to the deprecation mechanism.
Conference Paper
Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflow—sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summa-rization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.
Article
The open source community, as well as numerous technical blogs and community web sites, put online vast quantities of free source code, ranging from snippets to full-blown products. This code embodies the software development community's domain knowledge, and mirrors the structure of the Internet: it is distributed rather than hierarchical; it is chaotic, incomplete, and inconsistent. StackOverflow.com is a Question and Answer (Q&A) website which uses social media to facilitate knowledge exchange between programmers by mitigating the pitfalls involved in using code from the Internet. Its design nurtures a community of developers, and enables crowd sourced software engineering activities ranging from documentation to providing useful, high quality code snippets to be used in production. In this chapter we review Stack Overflow from three perspectives: (1) its design and its social media characteristics, (2) the role it plays in the software documentation landscape, and (3) the use of Stack Overflow in the context of the example centric programming paradigm. © Springer Science+Business Media New York 2013. All rights are reserved.
Article
The paper discusses the application program interface (API). Most software projects reuse components exposed through APIs. In fact, current-day software development technologies are becoming inseparable from the large APIs they provide. An API is the interface to implemented functionality that developers can access to perform various tasks. APIs support code reuse, provide high-level abstractions that facilitate programming tasks, and help unify the programming experience. A study of obstacles that professional Microsoft developers faced when learning to use APIs uncovered challenges and resulting implications for API users and designers. The article focuses on the obstacles to learning an API. Although learnability is only one dimension of usability, there's a clear relationship between the two, in that difficult-to-use APIs are likely to be difficult to learn as well. Many API usability studies focus on situations where developers are learning to use an API. The author concludes that as APIs keep growing larger, developers will need to learn a proportionally smaller fraction of the whole. In such situations, the way to foster more efficient API learning experiences is to include more sophisticated means for developers to identify the information and the resources they need-even for well-designed and documented APIs.
Article
Reading reference documentation is an important part of programming with application programming interfaces (APIs). Reference documentation complements the API by providing information not obvious from the API syntax. To improve the quality of reference documentation and the efficiency with which the relevant information it contains can be accessed, we must first understand its content. We report on a study of the nature and organization of knowledge contained in the reference documentation of the hundreds of APIs provided as a part of two major technology platforms: Java SDK 6 and .NET 4.0. Our study involved the development of a taxonomy of knowledge types based on grounded methods and independent empirical validation. Seventeen trained coders used the taxonomy to rate a total of 5,574 randomly sampled documentation units to assess the knowledge they contain. Our results provide a comprehensive perspective on the patterns of knowledge in API documentation: observations about the types of knowledge it contains and how this knowledge is distributed throughout the documentation. The taxonomy and patterns of knowledge we present in this paper can be used to help practitioners evaluate the content of their API documentation, better organize their documentation, and limit the amount of low-value content. They also provide a vocabulary that can help structure and facilitate discussions about the content of APIs.
Conference Paper
Knowledge management plays an important role in many software organizations. Knowledge can be captured and distributed using a variety of media, including traditional help files and manuals, videos, technical articles, wikis, and blogs. In recent years, web-based community portals have emerged as an important mechanism for combining various communication channels. However, there is little advice on how they can be effectively deployed in a software project. In this paper, we present a first study of a community portal used by a closed source software project. Using grounded theory, we develop a model that characterizes documentation artifacts along several dimensions, such as content type, intended audience, feedback options, and review mechanisms. Our findings lead to actionable advice for industry by articulating the benefits and possible shortcomings of the various communication channels in a knowledge-sharing portal. We conclude by suggesting future research on the increasing adoption of community portals in software engineering projects.
Article
In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion entries for search requests over a period of six weeks. This represents almost 285 million user sessions, each an attempt to fill a single information need. We present an analysis of individual queries, query duplication, and query sessions. We also present results of a correlation analysis of the log entries, studying the interaction of terms within queries. Our data supports the conjecture that web users differ significantly from the user assumed in the standard information retrieval literature. Specifically, we show that web users type in short queries, mostly look at the first 10 results only, and seldom modify the query. This suggests that traditional information retrieval techniques may not work well for answering web search requests. The correlation analysis showed that the most highly correlated items are constituents of phrases. This result indicates it may be useful for search engines to consider search terms as parts of phrases even if the user did not explicitly specify them as such.
Article
Large APIs can be hard to learn, and this can lead to decreased programmer productivity. But what makes APIs hard to learn? We conducted a mixed approach, multi-phased study of the obstacles faced by Microsoft developers learning a wide variety of new APIs. The study involved a combination of surveys and in-person interviews, and collected the opinions and experiences of over 440 professional developers. We found that some of the most severe obstacles faced by developers learning new APIs pertained to the documentation and other learning resources. We report on the obstacles developers face when learning new APIs, with a special focus on obstacles related to API documentation. Our qualitative analysis elicited five important factors to consider when designing API documentation: documentation of intent; code examples; matching APIs with scenarios; penetrability of the API; and format and presentation. We analyzed how these factors can be interpreted to prioritize API documentation development efforts
Conference Paper
Software development blogs, developer forums and Q&A websites are changing the way software is documented. With these tools, developers can create and communicate knowledge and experiences without relying on a central authority to provide official documentation. Instead, any content created by a developer is just a web search away. To understand whether documentation via social media can replace or augment more traditional forms of documentation, we study the extent to which the methods of one particular API - jQuery - are documented on the Web. We analyze 1,730 search results and show that software development blogs in particular cover 87.9% of the API methods, mainly featuring tutorials and personal experiences about using the methods. Further, this effort is shared by a large group of developers contributing just a few blog posts. Our findings indicate that social media is more than a niche in software documentation, that it can provide high levels of coverage and that it gives readers a chance to engage with authors.