Detecting Influenza Epidemics Using Search Engine Query Data

Google Inc., 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.
Nature (Impact Factor: 42.35). 12/2008; 457(7232):1012-4. DOI: 10.1038/nature07634
Source: PubMed

ABSTRACT Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year. In addition to seasonal influenza, a new strain of influenza virus against which no previous immunity exists and that demonstrates human-to-human transmission could result in a pandemic with millions of fatalities. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza. One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.

  • Source
    • "Such an extreme potential of Internet search data has been put into practice and it is now being used for tracking or even anticipating various social phenomena. The utilization ranges from influenza tracking (Dugas et al. 2012; Ginsberg et al. 2008), consumer interest and its impact on product sales (Choi and Varian 2009; Goel et al. 2010; Kulkarni 2012) to macroeconomic indicators (Askitas and Zimmermann 2009; Cooper et al. 2005; Preis et al. 2010). The work of Merton (1987) suggests that attention may be also relevant for the complex reality of financial markets and Preis et al. (2008) are among the first ones to support this hypothesis using the web search data to proxy attention. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Online activity of Internet users has proven very useful in modeling various phenomena across a wide range of scientific disciplines. In our study, we focus on two stylized facts or puzzles surrounding the initial public offerings (IPOs) - the underpricing and the long-term underperformance. Using the Internet searches on Google, we proxy the investor attention before and during the day of the offering to show that the high attention IPOs have different characteristics than the low attention ones. After controlling for various effects, we show that investor attention still remains a strong component of the high initial returns (the underpricing), primarily for the high sentiment periods. Moreover, we demonstrate that the investor attention partially explains the overoptimistic market reaction and thus also a part of the long-term underperformance.
    SpringerPlus 12/2015; 4(1):84. DOI:10.1186/s40064-015-0839-4
    • "Google data has already been applied in forecasting flu (Ginsberg et al. 2009), economic indicators (Choi and Varian 2012) and private consumption (Vosen and Schmidt 2011). Koop and Onorante (2013) use Google data in a dynamic model selection approach for macroeconomic nowcasting , stressing the fact that including the Google variables in a regression framework might not always be optimal because of the nonlinear dynamics of the attention process. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes an empirical similarity approach to forecast weekly volatility by using search engine data as a measure of investors attention to the stock market index. Our model is assumption free with respect to the underlying process of investors attention and significantly outperforms conventional time-series models in an out-of-sample forecasting framework. We find that especially in high-volatility market phases prediction accuracy increases together with investor attention. The practical implications for risk management are highlighted in a Value-at-Risk forecasting exercise, where our model produces significantly more accurate forecasts while requiring less capital due to fewer overpredictions.
    Journal of Economic Behavior & Organization 09/2015; 117:62-81. DOI:10.1016/j.jebo.2015.06.005 · 1.01 Impact Factor
  • Source
    • "One interesting example is the Google Flu Index that was mentioned earlier. In 2009, a team from Google Inc. and the Centers for Disease Control and Prevention (CDC) published a paper in Nature that described the development of a methodology for examining billions of Google search queries in order to monitor influenza in the general population (Ginsberg et al., 2009). "
    [Show description] [Hide description]
    DESCRIPTION: Paper presented at the Workshop on Big Data and Urban Informatics, University of Illinois at Chicago, August 2014.
Show more

Preview (2 Sources)

Available from