Analysis of Web access logs for surveillance of influenza

RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, PA 15219, USA.
Studies in health technology and informatics 02/2004; 107(Pt 2):1202-6.
Source: PubMed


The purpose of this study was to determine whether the level of influenza in a population correlates with the number of times that internet users access information about influenza on health-related Web sites. We obtained Web access logs from the Healthlink Web site. Web access logs contain information about the user and the information the user accessed, and are maintained electronically by most Web sites, including Healthlink. We developed weekly counts of the number of accesses of selected influenza-related articles on the Healthlink Web site and measured their correlation with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) using the cross-correlation function (CCF). We defined timeliness as the time lag at which the correlation was a maximum. There was a moderately strong correlation between the frequency of influenza-related article accesses and the CDC's traditional surveillance data, but the results on timeliness were inconclusive. With improvements in methods for performing spatial analysis of the data and the continuing increase in Web searching behavior among Americans, Web article access has the potential to become a useful data source for public health early warning systems.

Download full-text


Available from: William R Hogan
  • Source
    • "The extensive development of world wide web gives rise to novel sources for flu detection. Common approaches on web-based influenza surveillance usually make use of click-through data from search engines [16] such as counting search queries submitted to a medical website [19], visitors to health websites [22] or clicks on a search keyword advertisement [34]. Another important click-based flu reporting system is the famous Google's flu trends service [11] [17] 1 . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Influenza is an acute respiratory illness that occurs virtually every year and results in substantial disease, death and expense. Detection of Influenza in its earliest stage would facilitate timely action that could reduce the spread of the illness. Existing systems such as CDC and EISS which try to collect diagnosis data, are almost entirely manual, resulting in about two-week delays for clinical data acquisition. Twitter, a popular microblogging service, provides us with a perfect source for early-stage flu detection due to its real- time nature. For example, when a flu breaks out, people that get the flu may post related tweets which enables the detection of the flu breakout promptly. In this paper, we investigate the real-time flu detection problem on Twitter data by proposing Flu Markov Network (Flu-MN): a spatio-temporal unsupervised Bayesian algorithm based on a 4 phase Markov Network, trying to identify the flu breakout at the earliest stage. We test our model on real Twitter datasets from the United States along with baselines in multiple applications, such as real-time flu breakout detection, future epidemic phase prediction, or Influenza-like illness (ILI) physician visits. Experimental results show the robustness and effectiveness of our approach. We build up a real time flu reporting system based on the proposed approach, and we are hopeful that it would help government or health organizations in identifying flu outbreaks and facilitating timely actions to decrease unnecessary mortality.
    Preview · Article · Sep 2013
  • Source
    • "Google Flu Trends (Ginsberg et al., 2008) tracks the rate of influenza using query logs on a daily basis, up to 7 to 10 days faster than CDC's FluView (Carneiro and Mylonakis, 2009). Similar results have been reported for several other types of query logs (Valdivia et al., 2010; Polgreen et al., 2008; Hulth et al., 2009; Johnson et al., 2004; Pelat et al., 2009). Lampos and Cristianini (2010) are able to learn a Twitter flu rate producing a 0.97 correlation with the UK's Health Protection Agency influenza infection rates for the second half of 2009. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present the Ailment Topic Aspect Model (ATAM), a new topic model for Twitter that associates symptoms, treatments and general words with diseases (ailments). We train ATAM on a new collection of 1.6 million tweets discussing numerous health related topics. ATAM isolates more coherent ail-ments, such as influenza, infections, obesity, as compared to standard topic models. Fur-thermore, ATAM matches influenza tracking results produced by Google Flu Trends and previous influenza specialized Twitter models compared with government public health data.
    Preview · Article · Jan 2012
  • Source
    • "There has been growing interest in monitoring disease outbreaks using the Internet. Previous approaches have applied data mining techniques to news articles (Grishman et al., 2002; Mawudeku and Blench, 2006; Brownstein et al., 2008; Reilly et al., 2008; Collier et al., 2008; Linge et al., 2009), blogs (Corley et al., 2010), search engine logs (Eysenbach, 2006; Polgreen et al., 2008; Ginsberg et al., 2009), and Web browsing patterns (Johnson et al., 2004). The recent emergence of micro-blogging services such as "
    [Show abstract] [Hide abstract]
    ABSTRACT: We analyze over 500 million Twitter messages from an eight month period and find that tracking a small number of flu-related keywords allows us to forecast future influenza rates with high accuracy, obtaining a 95% correlation with national health statistics. We then analyze the robustness of this approach to spurious keyword matches, and we propose a document classification component to filter these misleading messages. We find that this document classifier can reduce error rates by over half in simulated false alarm experiments, though more research is needed to develop methods that are robust in cases of extremely high noise.
    Preview · Article · Jul 2010
Show more