Using Web Search Query Data to Monitor Dengue
Epidemics: A New Model for Neglected Tropical Disease
Emily H. Chan1,2, Vikram Sahai3, Corrie Conrad3, John S. Brownstein1,2,4*
1Children’s Hospital Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Boston, Massachusetts, United
States of America, 2Division of Emergency Medicine, Children’s Hospital Boston, Boston, Massachusetts, United States of America, 3Google Inc., Mountain View,
California, United States of America, 4Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, United States of America
Background: A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and
reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as
Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have
evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a
vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early
detection and monitoring of dengue epidemics.
Methodology/Principal Findings: Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available
data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the
fraction of Google search query volume for specific dengue-related queries from that country against a time series of official
dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize
model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset
of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found
to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.
Conclusions/Significance: Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil,
India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after
some substantial delay, web search query data are available in near real-time. These data represent valuable complement to
assist with traditional dengue surveillance.
Citation: Chan EH, Sahai V, Conrad C, Brownstein JS (2011) Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical
Disease Surveillance. PLoS Negl Trop Dis 5(5): e1206. doi:10.1371/journal.pntd.0001206
Editor: Serap Aksoy, Yale School of Public Health, United States of America
Received April 12, 2011; Accepted May 2, 2011; Published May 31, 2011
Copyright: ? 2011 Chan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Two of the authors (VS, CC) are employees of one of the funders of the study (Google Inc.) and were involved with the study design, data collection
and analysis, decision to publish, and preparation of the manuscript.
Competing Interests: This study was supported by funding from Google Inc., and two of the authors (VS, CC) are employees of Google Inc.
* E-mail: firstname.lastname@example.org
With an estimated 500 million people infected each year ,
dengue ranks as one of the most significant mosquito-borne viral
human diseases, and one of the most rapidly emerging vector-
borne diseases [2,3]. Considered to be endemic in over 100
countries, mostly in South-East Asia, the Americas and Western
Pacific islands , recent estimates according to the Pediatric
Dengue Vaccine Initiative put the population at risk, at 3.6 billion,
or 55% of the world population.
Most national surveillance systems for dengue in endemic
countries currently depend on passive or sentinel site surveillance
of hospitalizations with some countries also monitoring outpatient
clinics. However, weaknesses in these systems including non-
streamlined bureaucratic structuring, politics and lack of funding
for skilled personnel and equipment at local level laboratories have
been cited as interfering with timely reporting and confirmation of
Alternative approaches to surveillance have turned to data
outside of the virological or clinical domains with the hope of
capturing health-seeking behavior at the earlier stages of disease
progression, as well as capturing the population of the ill who do not
seek medical care formally. Examples of these data include
telephone triage calls , sales of over-the-counter drugs ,
school/work absenteeism , and online activity [8–12]. These
data could complement traditional surveillance by potentially
facilitating earlier detection, though results with respect to
correlation and timeliness have been variable . Even if the
signals in one data source are no earlier than in another, there is
benefit in using data that provide access to information on a more
real-time or near real-time basis. The value of "predicting the
present" for situations where data for the present may theoretically
be availablebutnot be accessibleuntil the futureis discussedin.
These novel approaches have so far for the most part been
narrowly focused and validated on influenza-like and gastrointes-
tinal illness. One example of such an effort is Google Flu Trends
www.plosntds.org1May 2011 | Volume 5 | Issue 5 | e1206
1. Beatty ME, Stone A, Fitzsimons DW, Hanna JN, Lam SK, et al. (2010) Best
practices in dengue surveillance: a report from the Asia-Pacific and Americas
Dengue Prevention Boards. PLoS Negl Trop Dis 4: e890–e890.
2. Guzman MG, Halstead SB, Artsob H, Buchy P, Farrar J, et al. (2010) Dengue: a
continuing global threat. Nat Rev Microbiol 8: S7–S16.
3. Special Programme for Research & Training in Tropical Diseases (2007)
Scientific working group report on dengue. Geneva, Switzerland: World Health
4. Runge-Ranzinger S, Horstick O, Marx M, Kroeger A (2008) What does dengue
disease surveillance contribute to predicting and detecting outbreaks and
describing trends? Trop Med Int Health 13: 1022–1041.
5. Yih WK, Teates KS, Abrams A, Kleinman K, Kulldorff M, et al. (2009)
Telephone triage service data for detection of influenza-like illness. PLoS One 4:
6. Das D, Metzger K, Heffernan R, Balter S, Weiss D, et al. (2005) Monitoring
over-the-counter medication sales for early detection of disease outbreaks--New
York City. MMWR Morb Mortal Wkly Rep 54 Suppl. pp 41–46.
7. Besculides M, Heffernan R, Mostashari F, Weiss D (2005) Evaluation of school
absenteeism data for early outbreak detection, New York City. BMC Public
Health 5: 105–105.
8. Eysenbach G (2006) Infodemiology: tracking flu-related searches on the web for
syndromic surveillance. AMIA Annu Symp Proc. pp 244–248.
9. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, et al. (2009)
Detecting influenza epidemics using search engine query data. Nature 457:
10. Hulth A, Rydevik G, Linde A (2009) Web queries as a source for syndromic
surveillance. PLoS One 4: e4378–e4378.
11. Johnson HA, Wagner MM, Hogan WR, Chapman W, Olszewski RT, et al.
(2004) Analysis of Web access logs for surveillance of influenza. Stud Health
Technol Inform 107: 1202–1206.
12. Polgreen PM, Chen Y, Pennock DM, Nelson FD (2008) Using internet searches
for influenza surveillance. Clin Infect Dis 47: 1443–1448.
13. Dailey L, Watkins RE, Plant AJ (2007) Timeliness of data sources used for
influenza surveillance. J Am Med Inform Assoc 14: 626–631.
14. Choi H, Varian H (2009) Predicting the Present with Google Trends.
15. Pelat C, Turbelin Cm, Bar-Hen A, Flahault A, Valleron A-J (2009) More
diseases tracked by using Google Trends. Emerg Infect Dis 15: 1327–1328.
16. Go ´mez-Dante ´s H, Willoquet JR (2009) Dengue in the Americas: challenges for
prevention and control. Cad Saude Publica 25: S19–S31.
17. Chairulfatah A, Setiabudi D, Agoes R, van Sprundel M, Colebunders R (2001)
Hospital based clinical surveillance for dengue haemorrhagic fever in Bandung,
Indonesia 1994-1995. Acta Trop 80: 111–115.
18. Oum S, Chandramohan D, Cairncross S (2005) Community-based surveillance:
a pilot study from rural Cambodia. Trop Med Int Health 10: 689–697.
19. Camacho T, de la Hoz F, Ca ´rdenas V, Sa ´nchez C, de Caldero ´n L, et al. (2004)
Incomplete surveillance of a dengue-2 epidemic in Ibague ´, Colombia, 1995-
1997. Biomedica 24: 174–182.
20. Dechant EJ, Rigau-Pe ´rez JG (1999) Hospitalizations for suspected dengue in
Puerto Rico, 1991-1995: estimation by capture-recapture methods. The Puerto
Rico Association of Epidemiologists. Am J Trop Med Hyg 61: 574–578.
21. Yew YW, Ye T, Ang LW, Ng LC, Yap G, et al. (2009) Seroepidemiology of
dengue virus infection among adults in Singapore. Ann Acad Med Singapore 38:
22. e-Technology Group IMRB International and Internet, Mobile Association of
India (IAMAI) (2010) Internet for Rural India: 2009.
23. Chen LH, Wilson ME (2010) Dengue and chikungunya infections in travelers.
Curr Opin Infect Dis 23: 438–444.
24. Chahar HS, Bharaj P, Dar L, Guleria R, Kabra SK, et al. (2009) Co-infections
with chikungunya virus and dengue virus in Delhi, India. Emerg Infect Dis 15:
Using Web Search Query Data to Monitor Dengue
www.plosntds.org6 May 2011 | Volume 5 | Issue 5 | e1206