Analysis of Web access logs for surveillance of influenza.

RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, PA 15219, USA.
Studies in health technology and informatics 02/2004; 107(Pt 2):1202-6.
Source: PubMed

ABSTRACT The purpose of this study was to determine whether the level of influenza in a population correlates with the number of times that internet users access information about influenza on health-related Web sites. We obtained Web access logs from the Healthlink Web site. Web access logs contain information about the user and the information the user accessed, and are maintained electronically by most Web sites, including Healthlink. We developed weekly counts of the number of accesses of selected influenza-related articles on the Healthlink Web site and measured their correlation with traditional influenza surveillance data from the Centers for Disease Control and Prevention (CDC) using the cross-correlation function (CCF). We defined timeliness as the time lag at which the correlation was a maximum. There was a moderately strong correlation between the frequency of influenza-related article accesses and the CDC's traditional surveillance data, but the results on timeliness were inconclusive. With improvements in methods for performing spatial analysis of the data and the continuing increase in Web searching behavior among Americans, Web article access has the potential to become a useful data source for public health early warning systems.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a survey of novel technologies for uncovering implicit knowledge through the analysis of user-contributed content in Web2.0 applications. The special features of emergent semantics are herein described, along with the various dimensions that the techniques should be able to handle. Consequently a series of application domains is given where the extracted information can be consumed. The relevant techniques are reviewed and categorised according to their capability for scaling, multi-modal analysis, social networks analysis, semantic representation, real-time and spatio-temporal processing. A showcase of such an emergent semantics extraction application, namely ClustTour, is also presented, and open issues and future challenges in this new field are discussed.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data, such as social media and search queries, are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with [Formula: see text] up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.
    PLoS Computational Biology 11/2014; 10(11):e1003892. DOI:10.1371/journal.pcbi.1003892 · 4.83 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The search data of E-commerce site reflects millions of online consumers' concerns and interests, as well as trends of their behavior, so it can also provide essential data basis for the study of daily number of E-commerce orders. This paper firstly establishes a conceptual framework, revealing that there is a certain correlation and lead-lag relationship between Ecommerce site search data and orders. Then, by empirical analysis of orders and search data of an E-commerce site, the paper processes a search data index and builds a prediction model based on it, which confirming that this relationship is statistically significant. Prediction results of the following seven days show that it works well, which can be helpful for Ecommerce enterprises to improve their inventory management and its steady development.
    2013 6th International Conference on Information Management, Innovation Management and Industrial Engineering (ICIII); 11/2013

Full-text (3 Sources)

Available from
May 26, 2014