Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

Children's Hospital Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Boston, Massachusetts,USA.
PLoS Neglected Tropical Diseases (Impact Factor: 4.45). 05/2011; 5(5):e1206. DOI: 10.1371/journal.pntd.0001206
Source: PubMed


A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics.
Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.
Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

Full-text preview

Available from: PubMed Central
  • Source
    • "First, online surveillance offers immediate insights into the present status of disease. That is, online surveillance may " predict the present "[48]without the reporting lags associated with complicated reporting procedures in public health bureaucracies[44]. Second, online surveillance may overcome the weaknesses of traditional surveillance systems, such as poor sensitivity to new diseases[49]and the lack of skills and equipment required for early disease detection[50]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Human immunodeficiency virus (HIV) is a serious health problem in the Russian Federation. However, the true scale of HIV in Russia has long been the subject of considerable debate. Using digital surveillance to monitor diseases has become increasingly popular in high income countries. But Internet users may not be representative of overall populations, and the characteristics of the Internet-using population cannot be directly ascertained from search pattern data. This exploratory infoveillance study examined if Internet search patterns can be used for disease surveillance in a large middle-income country with a dispersed population. This study had two main objectives: (1) to validate Internet search patterns against national HIV prevalence data, and (2) to investigate the relationship between search patterns and the determinants of Internet access. We first assessed whether online surveillance is a valid and reliable method for monitoring HIV in the Russian Federation. Yandex and Google both provided tools to study search patterns in the Russian Federation. We evaluated the relationship between both Yandex and Google aggregated search patterns and HIV prevalence in 2011 at national and regional tiers. Second, we analyzed the determinants of Internet access to determine the extent to which they explained regional variations in searches for the Russian terms for "HIV" and "AIDS". We sought to extend understanding of the characteristics of Internet searching populations by data matching the determinants of Internet access (age, education, income, broadband access price, and urbanization ratios) and searches for the term "HIV" using principal component analysis (PCA). We found generally strong correlations between HIV prevalence and searches for the terms "HIV" and "AIDS". National correlations for Yandex searches for "HIV" were very strongly correlated with HIV prevalence (Spearman rank-order coefficient [rs]=.881, P≤.001) and strongly correlated for "AIDS" (rs=.714, P≤.001). The strength of correlations varied across Russian regions. National correlations in Google for the term "HIV" (rs=.672, P=.004) and "AIDS" (rs=.584, P≤.001) were weaker than for Yandex. Second, we examined the relationship between the determinants of Internet access and search patterns for the term "HIV" across Russia using PCA. At the national level, we found Principal Component 1 loadings, including age (-0.56), HIV search (-0.533), and education (-0.479) contributed 32% of the variance. Principal Component 2 contributed 22% of national variance (income, -0.652 and broadband price, -0.460). This study contributes to the methodological literature on search patterns in public health. Based on our preliminary research, we suggest that PCA may be used to evaluate the relationship between the determinants of Internet access and searches for health problems beyond high-income countries. We believe it is in middle-income countries that search methods can make the greatest contribution to public health.
    Full-text · Article · Nov 2013 · Journal of Medical Internet Research
  • Source
    • "In 2007, Venezuela, alone, reported more than 80,000 cases, including more than 6,000 cases of DHF. Google dengue trends provides the daily updates of current dengue fever activity for 10 countries, which enables earlier detection of outbreaks and epidemics and public health officials to mobilize outbreak containment measures in a timely manner (Chan et al., 2011). Recent outbreaks of dengue fever have been reported from many countries in Central and South America and Mexico, as well as various regions in Southeast Asia; dengue is endemic in these areas and should always be considered in the differential diagnosis of acute febrile illness, especially in travelers returning from that area (Communicable Diseases Communiqué, 2012). "
    Chapter: Dengue
    [Show abstract] [Hide abstract]
    ABSTRACT: Dengue is caused by the dengue virus (DENV), a member of the Flavivirus genus of the Flaviviridae family. This family consists of enveloped, positive-stranded RNA viruses. The dengue viruses are comprised of four distinct serotypes, DENV1 through DENV4, which are mainly transmitted to humans through the bites of two mosquito species, Aedes aegypti and Aedes albopictus. The female Aedes (Stegomyia) mosquito transmits the dengue virus from person to person in the domestic environment. It can also be transmitted via infected blood products and through organ donation (Stramer et al., 2009; Wilder-Smith et al., 2009). Mother-to-child transmission (vertical transmission) during pregnancy or at birth has also been reported by Wiwanitkit (2010). Some other person-to-person modes of transmission have also been reported, but these are very unusual (Chen and Wilson, 2010). The origin of the dengue infection is still unclear. An epidemic of “knee fever” was described in Cairo, Egypt, in 1779 (Thongcharoen and Jatanasen, 1993). The name dengue is actually derived from the Swahili word Ki denga pepo, meaning a sudden seizure by a demon. The term break bone fever was coined during an epidemic in Philadelphia in the United States in 1780 (Ananthanarayan and Paniker, 2000). Outbreaks have occurred in the continental United States in 1780, in Hawaii in 1903, and in Greece during 1927 and1928. The clinical presentation of dengue fever resembles illness caused by chikungunya and O’nyong-nyong viruses (Ananthanarayan and Paniker, 2000). Over the years, the disease has been given several names: break bone fever, dandy fever, Korean hemorrhagic fever, Thai hemorrhagic fever, Philippine hemorrhagic fever, knee fever, 7-day fever, and Dhaka fever (Thongcharoen and Jatanasen, 1993).
    Full-text · Chapter · Jun 2013
  • Source
    • "Numerous studies have examined how Internet searches can "predict the present", meaning that search volume correlates with contemporaneous events [18-20]. Specifically in the case of influenza, search volume was shown to estimate flu activity, which was not officially reported until two weeks later, and despite unknown flu status of the searchers. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The objective of this study was to investigate the use of novel surveillance tools in a malaria endemic region where prevalence information is limited. Specifically, online reporting for participatory epidemiology was used to gather information about malaria spread directly from the public. Individuals in India were incentivized to self-report their recent experience with malaria by micro-monetary payments. Self-reports about malaria diagnosis status and related information were solicited online via Amazon's Mechanical Turk. Responders were paid $0.02 to answer survey questions regarding their recent experience with malaria. Timing of the peak volume of weekly self-reported malaria diagnosis in 2010 was compared to other available metrics such as the volume over time of and information about the epidemic from media sources. Distribution of Plasmodium species reports were compared with values from the literature. The study was conducted in summer 2010 during a malaria outbreak in Mumbai and expanded to other cities during summer 2011, and prevalence from self-reports in 2010 and 2011 was contrasted. Distribution of Plasmodium species diagnosis through self-report in 2010 revealed 59% for Plasmodium vivax, which is comparable to literature reports of the burden of P. vivax in India (between 50 and 69%). Self-reported Plasmodium falciparum diagnosis was 19% and during the 2010 outbreak and the estimated burden was between 10 and 15%. Prevalence between 2010 and 2011 via self-reports decreased significantly from 36.9% to 19.54% in Mumbai (p = 0.001), and official reports also confirmed a prevalence decrease in 2011. With careful study design, micro-monetary incentives and online reporting are a rapid way to solicit malaria, and potentially other public health information. This methodology provides a cost-effective way of executing a field study that can act as a complement to traditional public health surveillance methods, offering an opportunity to obtain information about malaria activity, temporal progression, demographics affected or Plasmodium-specific diagnosis at a finer resolution than official reports can provide. The recent adoption of technologies, such as the Internet supports self-reporting mediums, and self-reporting should continue to be studied as it can foster preventative health behaviours.
    Full-text · Article · Feb 2012 · Malaria Journal
Show more