PreprintPDF Available

# Incorporating human mobility data improves forecasts of Dengue fever in Thailand

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
Preprint

# Incorporating human mobility data improves forecasts of Dengue fever in Thailand

## Abstract and Figures

Over 390 million people worldwide are infected with dengue fever each year. In the absence of an effective vaccine for general use, national control programs must rely on hospital readiness and targeted vector control to prepare for epidemics, so accurate forecasting remains an important goal. Many dengue forecasting approaches have used environmental data linked to mosquito ecology to predict when epidemics will occur, but these have had mixed results. Conversely, human mobility, an important driver in the spatial spread of infection, is often ignored. Here we compare time-series forecasts of dengue fever in Thailand, integrating epidemiological data with mobility models generated from mobile phone data. We show that long-distance connectivity is correlated with dengue incidence at forecasting horizons of up to three months, and that incorporating mobility data improves traditional time-series forecasting approaches. Notably, no single model or class of model always outperformed others. We propose an adaptive, mosaic forecasting approach for early warning systems.
Content may be subject to copyright.
Incorporating human mobility data improves forecasts of
Dengue fever in Thailand
Mathew V Kiang ScDa,$, Mauricio Santillana PhDb,c,$, Jarvis T Chen ScDd, Jukka-Pekka Onnela
DSce, Nancy Krieger PhDd, Kenth Engø-Monsen PhDf, Nattwut Ekapiratg, Darin Areechokchaih,
Preecha Prempreeh, Richard J. Maude MD DPhil,g,i,k, and Caroline O Buckee PhDj,k,1,
a Center for Population Health Sciences, Stanford University, Stanford, California USA
b Department of Pediatrics, Harvard Medical School, Boston, Massachusetts USA
c Computational Health Informatics Program, Boston Children’s Hospital, Boston,
Massachusetts USA
d Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health,
Boston, Massachusetts USA
e Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston,
Massachusetts USA
f Telenor Research, Oslo, Norway
g Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol
University, Bangkok, Thailand
h Bureau of Vector Borne Disease, Ministry of Public Health, Nonthaburi, Thailand
i Centre for Tropical Medicine and Global Health, Nuffield Dept of Medicine, University of
Oxford, Oxford, UK
j Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston,
Massachusetts USA
k Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health,
Boston, Massachusetts USA
1 Corresponding Author:
Caroline O Buckee
e-mail: cbuckee@hsph.harvard.edu
phone: 617-432-1280
Center for Communicable Disease Dynamics
677 Huntington Ave, 5th Floor
Boston, MA 02115
$These authors contributed equally. † These authors jointly supervised this work. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint Abstract Over 390 million people worldwide are infected with dengue fever each year. In the absence of an effective vaccine for general use, national control programs must rely on hospital readiness and targeted vector control to prepare for epidemics, so accurate forecasting remains an important goal. Many dengue forecasting approaches have used environmental data linked to mosquito ecology to predict when epidemics will occur, but these have had mixed results. Conversely, human mobility, an important driver in the spatial spread of infection, is often ignored. Here we compare time-series forecasts of dengue fever in Thailand, integrating epidemiological data with mobility models generated from mobile phone data. We show that long-distance connectivity is correlated with dengue incidence at forecasting horizons of up to three months, and that incorporating mobility data improves traditional time-series forecasting approaches. Notably, no single model or class of model always outperformed others. We propose an adaptive, mosaic forecasting approach for early warning systems. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint Introduction More than half the world’s population is at risk of infection from the dengue virus, which causes an estimated 390 million infections (Bhatt et al., 2013) and 25,000 deaths per year (CDC, 2018; Guzman and Harris, 2015; WHO, 2018a). The dengue pathogen is spread in urban and peri-urban areas by invasive mosquitoes belonging to the Aedes complex. As a result, dengue has emerged as a major threat in the context of a rapidly urbanizing, globally connected world (Guzman and Harris, 2015; Tatem et al., 2006; Wesolowski et al., 2015b). For example, despite the general decline in the incidence of other communicable diseases, the incidence of dengue fever has doubled every 10 years since 1990 (Stanaway et al., 2016). The rapid geographic expansion of the vector suggests there will be a continuing emergence of dengue globally (Guzman and Harris, 2015; Tatem et al., 2006; Wesolowski et al., 2015b). Currently, there is no drug treatment for dengue (Halstead, 2012; WHO, 2012) and only a partially effective vaccine, which cannot be used in seronegative individuals (WHO, 2018b). Therefore, despite the mixed results of vector control efforts (WHO, 2012), targeted and thorough vector control approaches, hospital readiness, and risk communication can improve public health preparedness for seasonal outbreaks. Fundamental to the success of these preparations is data on the burden of disease in different areas, and some sense of how an epidemic may progress in the near term and on local spatial scales relevant to national control programs. Forecasting the epidemic trajectory of dengue on weekly or monthly timescales remains a relatively new science for infectious diseases (Baquero et al., 2018; Buczak et al., 2018; Choudhury et al., 2008; Eastin et al., 2014; Gharbi et al., 2011; Hii et al., 2012; Hu et al., 2010; Johansson et al., 2016; Lauer et al., 2018; Martinez et al., 2011; Promprou et al., 2006; Nicholas G. Reich et al., 2016; Yamana et al., 2016; Yang et al., 2017). Unlike weather and climate . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint forecasting, where physical laws dictate the dynamics of the system, the social and biological dynamics that drive infectious disease outbreaks make forecasting dengue epidemics challenging. Recurring epidemics, as opposed to novel pathogens emerging for the first time, occur against a backdrop of shifting population immunity, which is difficult to quantify. Complicating surveillance, pathogens like dengue are primarily reported based on symptoms rather than laboratory confirmation. Like influenza and malaria, dengue causes non-specific symptoms, fever in particular, so reporting reliability and time lags impact data quality (Chretien et al., 2016; Olliaro et al., 2018; Scarpino et al., 2017). Despite these complexities, routine forecasting is an important priority for national dengue control programs (Nicholas G. Reich et al., 2016; WHO, 2012). There has been a recent surge of interest and success in building forecasting models for seasonal epidemics of dengue fever (Choudhury et al., 2008; Eastin et al., 2014; Gharbi et al., 2011; Hii et al., 2012; Hu et al., 2010; Johansson et al., 2016; Lauer et al., 2018; Martinez et al., 2011; Promprou et al., 2006; Nicholas G. Reich et al., 2016; Yamana et al., 2016; Yang et al., 2017). A distinction can be made between mechanistic epidemiological models and statistical models. Mechanistic models in which the mode of transmission (in this case, mosquito-borne and strong temperature dependence) is built into the model and drives the predicted infection dynamics. In contrast, statistical models rely on the identification of past epidemiological activity patterns and historical correlations with external data streams, generated often by human behavior on Internet search engines or social media, to monitor disease activity and predict future outbreaks. Mechanistic models aim at providing biological insight and a basis for interpretation, but for socially and environmentally complex infections like dengue, these models are often challenging to parameterize. Dengue is particularly challenging as it is composed of . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint multiple immunologically distinct strains and relies on the interaction of mosquito and human population dynamics and microclimate variability. Metapopulation models have been developed to incorporate the spatial dynamics of dengue outbreaks, modeling each area with a set of location-specific parameters and linking the areas through estimated migration of individuals. Metapopulation models play in an important role in our understanding of epidemic outbreaks across spatial regions (Arino and Driessche, 2003; Liu et al., 2018; Stolerman et al., 2015), synchronicity between regions (Lloyd and Jansen, 2004), oscillations of epidemics (Lourenço and Recker, 2013), and strategies to reduce transmission (Lee and Castillo-Chavez, 2015). Despite their importance in understanding dynamics, mechanistic models, and metapopulation models in particular, may lack sufficient data for appropriate parameterization, and are often not feasible in a forecasting context. As a result, statistical models have been more successful for outbreak preparedness for which the modeling goal is to provide quantitative, relatively short- term predictions with explicit uncertainty (11, 13–20, 22, 28–30). Most statistical forecasting approaches for dengue have been based on autocorrelation in case data, often incorporating environmental information due to the importance of temperature and other factors to the availability of mosquitoes and variation of the incubation period of the virus in the vector. Many of these have focused on long-term predictions of dengue at the city level (Choudhury et al., 2008; Eastin et al., 2014; Luz et al., 2008; Stolerman et al., 2016), or larger regions within a specific country (Gharbi et al., 2011; Hu et al., 2010; Martinez et al., 2011; Promprou et al., 2006). Models often show mixed success with high prediction accuracy in the immediate forecasting horizons (e.g., 1-2 months) and rapid decay at longer time horizons (e.g. 3-6 months). It is unclear if weather or climate variables substantially improve forecasting; at least one study that systematically looked at different model parameters for autoregressive . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint models, with and without a wide range of climate variables, across states in Mexico found no conclusive improvement (Johansson et al., 2016). More recently, ensemble models have become a powerful way to combine different approaches in order to leverage the strengths of each while minimizing the weaknesses (52). This approach has recently been applied to dengue (Yamana et al., 2016). Others have incorporated new sources of data from internet search terms to predict dengue nationally (Yang et al., 2017), employed novel statistical methods to predict dengue in San Juan, Puerto Rico (Ray et al., 2017), or combined common climate covariates with generalized additive models to predict annual incidence of dengue hemorrhagic fever (Lauer et al., 2018). Although dengue outbreaks spread primarily via human travel, incorporating this aspect of the spatial connectivity between locations within forecasting frameworks has been challenging. Current forecasting models, both mechanistic and statistical, either ignore or make crude assumptions about how populations are connected by travel. Parameterizing human mobility is challenging due to a paucity of relevant data streams, particularly in low-income settings. We have previously used mobile phone records to quantify national movements and showed that they provide improved prediction for dengue outbreaks in Pakistan (Wesolowski et al., 2015b). Specifically, we used a gravity models to parametrize human mobility in a mechanistic framework because dengue was emerging into naïve populations, where statistical methods could not be used. Others have used daily commuting data to model mobility using a radiation model, which in turn is used to parameterize a mechanistic model (Zhu et al., 2016). Although considerable difficulty remains in accessing mobile phone records or other scalable data sources about mobility, it is clear that gravity models, radiation models, and other proxies for travel measures may perform poorly in many settings (Wesolowski et al., 2015a). . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint To date, almost all efforts to forecast dengue have either focused on optimizing a single modeling framework across regions, fitting parameters individually, or analyzed multiple models for a particular location. Few statistical models used for forecasting dengue incorporate spatial dependencies and none incorporate information about mobility patterns. Here, we contribute to the existing literature by using seven years of monthly dengue data (2010–2016) from Thailand, which has a developed dengue surveillance program, and mobility data from approximately 11 million mobile phone subscribers to show that long-distance travel is associated with correlated epidemiological cases. We compare model structures incorporating time-series approaches or spatial dependencies, and mobility data, finding that this improves model prediction, but no individual approach provides the best performing model in all locations over all time horizons. We quantify the error for each province in Thailand, showing that provinces in the north of the country are more difficult to forecast with confidence than those in the south, regardless of model choice, and that different models’ performances may be linked to demographic and social factors such as population density and gross provincial product per capita. We propose that mosaic forecasting approaches, which dynamically adapt over time and space, and end up using the best model for that location and time period, are likely to be the most effective for use in early warning systems in national control programs. Results Greater than expected long-distance travel to and from Bangkok To assess inter-province migration, we analyzed the call data records (CDR) of approximately 11 million mobile phone subscribers between August 1, 2017 and October 19, 2017. At the time of data collection, the mobile phone operator had about 26% of the market share and was the third largest provider in Thailand. Since travel patterns remained stable over . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint our period of observation (coefficient of variation: 1.3%; SI Appendix, Fig. S1), we calculated average daily journeys between all pairs of provinces in both directions, and compared observed mobility in the CDR data to expected mobility based on gravity models (see Materials and Methods) assuming travel over our time period is consistent with travel for the rest of the year (SI Appendix, Fig. S2). We found that the routes of travel that deviate significantly from gravity model-based predictions in both directions are focused on Bangkok (Figure 1), with more travel than expected from long distances around the country such as Phuket and Bangkok itself (Figure 1, left), and less travel than expected within and around the city (Figure 1, right). These hot and cold spots, where higher or lower than expected travel was observed, were robust to the gravity model coefficients used (SI Appendix, Table S1). Figure 1. Under- and over-prediction of outlier travel. Relative under-prediction (left) and over-prediction (right) comparing observed mobility data (from CDRs) to estimated mobility data from the best fit gravity model. We defined relative prediction error as 100%*(PredictedTrips – ObservedTrips)/ObservedTrips. We highlight only observations with Cook’s distance greater than five times the average Cook’s distance. Note that Bangkok (center of the map) is central to much of the over- and under-prediction outliers with most over- prediction near Bangkok. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint Long-distance connectivity is associated with correlated dengue incidence In Thailand, dengue follows a seasonal cycle across all 77 provinces (Figure 2), with variation in the timing of onset and epidemic peak in different locations over our period of observation (Limkittikul et al., 2014). We analyzed the correlation between clinical cases in each province with different time lags between them. Figure 3 shows the relationship between the correlation in dengue cases between pairs of provinces, stratified with respect to geographic distance and mobility measured using mobile phone data. Consistent with previous studies (Cummings et al., 2004; Panhuis et al., 2015; Salje et al., 2017, 2012), the epidemiological correlation between provinces is strongest when they are close to each other and declines with distance and over time (i.e. the three-month lagged correlation is weaker than the one-month lagged correlation). For provinces less than 1,000 km apart, human mobility estimated using mobile phone data does not appear to impact the correlation of clinical cases. For longer distances, however, more highly connected locations show higher correlation in clinical dengue cases than locations the same distance apart but with low observed connectivity (Figure 3). Note that some but not all of these long-distance connections are locations with international airports (SI Appendix, Fig. S3), and provinces connected by airports have higher correlation than those that are not connected by airports (SI Appendix, Fig. S4). . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint Figure 2. Monthly dengue incidence by province. Monthly crude incidence of dengue (per 1,000 person-years) by province (y-axis) ordered by centroid latitude (higher is more northern) over seven years of observation (x-axis). Dengue in Thailand follows a seasonal cycle with geographic variation in both the timing of onset and peak of the epidemic. Figure 3. Correlation of province-level dengue by distance, at different time lags. We show the mean cross-correlation coefficient (y-axis) for pairs of provinces at binned distances (x-axis; 0 indicates correlation of an area with itself) for synchronous dengue (left panel) and lagged by 1 month (middle panel) and three months (right panel). The red line shows the bottom quartile of provinces in terms of incoming and outgoing travel and the blue line shows the top quartile. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint No one-size-fits-all: forecasting performance varies in space and time We compared several forecasting approaches for the 77 Thai provinces to assess how model performance varied by region and over time, and to measure the impact of integrating the mobility data. Specifically, for each province, we fit four models: (1) local (non-spatially dependent) models commonly used for dengue; specifically, seasonal autoregressive integrated moving average models (Plain SARIMA) across a grid of parameters, (2) SARIMA models that use information from the top five most connected provinces (in terms of number of incoming trips) based on mobile phone data (CDR SARIMA), (3) SARIMA models that use information from the top five most connected provinces (in terms of predicted number of incoming trips) based on our gravity model estimates (Gravity SARIMA), and (4) a data-driven network approach, based on a regularized regression approach, that predicts dengue incidence in a given location potentially using dengue incidence from every other location as input (LASSO; see Materials and Methods and reference (49) for details). Figure 4 illustrates the results of all models at all forecasting horizons for Bangkok (see SI Appendix, Text S1 for online-only results for all other provinces). At early forecasting horizons (i.e., one-month and up to three-months ahead time horizons), all models performed well, with the CDR SARIMA and Gravity SARIMA models outperforming the Plain SARIMA models by about 5–10% (Figure 5) as captured by the mean absolute error. After the 3-month ahead forecasting horizon, the Plain SARIMA model performance drops substantially faster than all other models. Importantly, the grouping of out-of-sample prediction errors, across forecasting horizons, tended to be closer in the LASSO models, indicating that across forecasting horizons, the network models lose predictive power more slowly than the SARIMA-based models. We present all plots for all provinces in an online repository (SI Appendix, Text S1). . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint Figure 4. Mean absolute error (MAE) for all Bangkok models. The mean absolute error (y- axis) expressed as number of cases for each model (x-axis) and for each forecast. Models are grouped as SARIMA with no exogenous variables (Plain), SARIMAs with the top 5 most connected regions based on the predicted trips from a gravity model (Gravity SARIMA), and SARIMAs with the top 5 most connected regions based on CDR data (CDR SARIMA). The rightmost models show a data-driven network model, denoted as LASSO, since it is based on a least absolute shrinkage and selection operator prediction model, and mosaic model. Figure 5. Comparing the best models for Bangkok, by model type. Focusing only on the best performing model for each model type and each time horizon, we show the relative mean absolute error (left panel) and the mean absolute error (right panel). On the left, the baseline of comparison is the traditional AR(1) model and the y-axis can be interpreted as the improvement over this baseline — i.e., a value of .9 indicates a 10% improvement. We show that both the Plain SARIMA (red) and CDR SARIMA (green) models perform better than the LASSO model at earlier forecasting horizons but perform worse at later horizons. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint In general, no single model or class of model outperformed others across all provinces or all forecasting horizons (Figure 6; SI Appendix, Fig. S5). We found that across all model types, provinces in the south of the country had lower prediction errors compared to those in the north of the country (Figure 6). This difference in forecasting power was particularly pronounced on longer time horizons. For example, when comparing the out-of-sample prediction errors of the CDR SARIMA to the Plain SARIMA, the CDR SARIMAs were worse in 8 tasks for forecasting horizons of 1 to 3 months and better in only 3 tasks with no statistically significant difference in the remaining 220 prediction tasks. However, for forecasting horizons of 4-6 months, the CDR SARIMA outperformed the Plain SARIMA in 40 tasks and only underperformed in 8 with no statistically significant difference in the remaining 183 tasks (SI Appendix, Fig. S6). Figure 6. Mean absolute error for the best model in each class at t+1, t+3, and t+6 forecasting horizons for all provinces. The mean absolute error (y-axis) on the prediction (i.e., log) scale of the best model for each class for all provinces (x-axis). Provinces are ordered by latitude (x-axis, right is more northerly). There is a general decline in predictive power at farther forecasting horizons and at more northerly provinces; however, no single model or class of model performs best across all areas and all prediction horizons. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint We measured the characteristics of provinces in which different models performed better or worse and found that the Plain SARIMA models performed similarly when comparing top and bottom deciles of total number of dengue cases, median number of monthly dengue cases, median monthly rate of dengue, population density, and GPP per capita. In contrast, the LASSO and mobility-augmented SARIMA models performed better in places with higher total annual cases, higher population, and lower GPP per capita (see SI Appendix, Fig. S8S13), suggesting systematic and generalizable differences in model performance that — with more validation and in combination with geographic variation in model performance — could be used to inform model choice. We show the feasibility of combining different classes of models by using a simple winner-takes-all voting system approach we named an adaptive mosaic model. This ensemble model selects the best performing model for each province and forecasting horizon based on the out-of-sample prediction error of previous three months, which allows the underlying base model to change over time (Figure 7). When comparing the out-of-sample prediction errors to an AR(1) model, the mosaic model outperforms the AR(1) in 107 tasks (i.e., province and forecasting horizon), underperforms the AR(1) in 3 tasks, and is not statistically significantly different from an AR(1) in the remaining 352 tasks (SI Appendix, Fig. S7). Further exploration of location- specific and task-specific voting predictions systems is outside of the scope of this study but should be explored in future research efforts. . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint Figure 7. Mosaic model vs AR(1) for Bangkok at t+1, t+3, and t+6 forecasting horizons. We show the predictions for a simple mosaic model at t+1, t+3, and t+6 forecasting horizons for Bangkok in blue. For comparison, we show predictions from an AR(1) in red and observed cases in grey bars. Under each bar, we indicate the base model selected by the mosaic ensemble using a winter-takes-all approach based on the previous three out-of-sample prediction months. Discussion Dengue forecasting remains an important public health challenge in Thailand and other endemic countries. Given the complexity of dengue transmission, statistical forecasting approaches like those examined here have been shown to produce meaningful disease estimates in multiple locations, and may therefore be suitable for immediate use by national control programs. In addition, we have shown that integrating additional data streams, such as information about human mobility, can improve forecasts in many areas, but the added benefit will be specific to the area and time horizon of interest. The interesting geographic variation in forecasting accuracy, which is not linked to population density or GPP per capita, may reflect the proximity to international borders with countries where frequent migration occurs. Overall, no single modeling approach can be expected to provide an optimal early warning system across all areas, even within a single country or region, or across all time horizons. So adaptive, mosaic forecasts are likely to provide the most effective approach. This type of approach could be easily . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint integrated within the data platforms recently developed in Thailand (Nicholas G. Reich et al., 2016), which are flexible enough to accommodate different modeling approaches and forecasts. We show that simple network methods (that implicitly incorporate human mobility) can improve upon commonly-used local SARIMA models. Also, given that the network-based approach we studied relied only on dengue case count data routinely collected by most endemic countries, we envision that similar approaches may be easily extended, and may prove to be meaningful, in many other locations around the Globe. The regularized multi-variate regression framework can also flexibly identify and incorporate additional province-level data, time lags, and other factors in the predictive model, that could be used as a hypothesis-generation tool that may capture temporal changes in inter-regional human mobility. We highlight the fact that even though the mobility data we used covered only a small fraction of time represented in the dengue case data (3.2%; i.e., 81 days vs 7 years), it was still able to improve the local (non-augmented) SARIMA, suggesting that even relatively coarse travel information would improve naïve SARIMA models. Although mobile phone data is challenging to obtain, the coarse granularity of mobility information that we used completely protects individual subscriber privacy while adding substantially to forecasting performance. Since it is continuously collected, there is no reason these data could not be aggregated by mobile operators and provided on a relatively frequent basis to disease control programs. A limitation of using CDR to model dengue transmission is that it reflects movement patterns of the entire population whereas dengue tends to occur more in children and young adults in urban areas (Limkittikul et al., 2014). As governments prioritize how and where to spend money to improve dengue surveillance, our study suggests new regularized regression frameworks that incorporate mobility data can improve forecasts substantially. Any forecasting model will depend on the quality of the . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint case data that it is trained upon, highlighting the primary importance of good epidemiological data. A limitation of this work is that most dengue cases in Thailand, as in most countries, are not confirmed with a diagnostic test, instead relying in syndromic surveillance. This can be unreliable with the case definition for dengue fever overlapping substantially with other causes of acute febrile illness and the completeness of the data relying on individual healthcare workers to complete the reporting forms. Thus, much of the money for better dengue forecasting should be focused on faster and better dengue case detection, more widespread diagnostic testing, sentinel surveillance of serotypes, a robust computational framework for sharing case data across regions to be analyzed centrally, and capacity building within control programs. Materials and Methods Dengue incidence data We obtained monthly dengue case counts for over 7,000 subdistricts in Thailand from the Ministry of Public Health. These data are not available publicly and are used with the permission of the Ministry of Public Health. They consist of monthly dengue incidence counts from January 2010 through December 2016, by mutually-exclusive disease type (i.e., dengue fever, dengue shock syndrome, or dengue hemorrhagic fever). We aggregated these data to the province level and overall dengue case counts. In our data, there was a national average of 91,000 dengue cases per year with a range of 39,368 (2014) to 145,600 (2013) cases per year. Mobile phone data To assess inter-province travel, we analyzed call data records (CDRs) of approximately 11 million mobile phone subscribers between August 1, 2017 and October 19, 2017. At the time of data collection, the mobile phone operator had about 26% of the market share and was the third largest provider in Thailand. In order to ensure the privacy of the mobile phone subscribers, . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint and in compliance with national laws and the privacy policy of the Telenor group, special considerations were taken with the CDRs. First, only the mobile operator had access to the CDR and all data processing was performed on a server owned by, and only available to, the operator, thus ensuring that detailed records never left the operator or Thailand. Second, the operator provided researchers with a list of approximate cellular tower locations. For every tower location, we returned a corresponding, unlinked geographic identifier (“geocode”) of the nearest subdistrict. Mobile operator employees then aggregated the detailed CDRs up to the researcher- provided geocodes. Further spatial and temporal aggregation was performed by the researchers. These data are not publicly available and are used with the permission of Telenor Research. To quantify travel, every subscriber was assigned a daily “home” location based on their most frequently used geocode. We tabulated daily travel between a subscriber’s home location on one day relative to the day before. Trips were aggregated to geocode-to-geocode pairs for every day and thus are memoryless — preventing the ability to trace a user (or group of users) across more than two days or more than two areas. We normalized the number of trips from geocode i to geocode j by the number of subscribers at geocode i. We then multiplied this proportion by the estimated population at geocode i to get the flow from i to j. This assumes that subscribers are more or less uniformly distributed across provinces (weighted by the population in each province). While this assumption cannot be fully tested, there is a strong correlation (Pearson’s r = .90) between subscribers and population for each province. On average, 11.4 million subscribers (16.7% of the total population) recorded at least one event (i.e., phone call, text message, internet activity) per day (SI Appendix, Fig. S1). At both the national and provincial levels, no significant deviations from the number of subscribers or the numbers of trips occurred during this time period. For example, at the national level, the . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint coefficient of variation for daily number of subscribers was 1.3%. Therefore, we used the mean number of trips over this time period as our estimate of inter-province travel. Population, gross provincial product per capita, and distance estimates To estimate province-level population, we used the United Nations-adjusted 2015 population estimates from WorldPop (Gaughan et al., 2013), which combines remote-sensing data with other data sources to create random-forest-generated population maps. Each file contains the estimated population per pixel and was overlaid with the official administrative shapefile. We then summed the value of all pixels within each province. We used publicly available 2015 gross provincial product per capita provided by the Office of the National Economic and Social Development Board of Thailand (NESDB, 2017). The concept of “distance” is flexible in the gravity model and geodesic distance often ignores important geographical (e.g., mountain ranges) or social and behavioral constants to human mobility. In addition to calculating geodesic distance between provinces, we calculated road distance and travel time based on OpenStreetMap data using Open Street Routing Machine (Luxen and Vetter, 2011). Comparing observed and predicted travel We compared observed travel between provinces with CDRs to those estimated by a gravity model with three different measures of distance: geodesic distance, road distance, and travel time. The gravity model is a popular econometric model (Tinbergen, 1963), often used to estimate mobility between areas (Lewer and Berg, 2008). The basic gravity model is: 𝑌!" = 𝑘$𝑃!
#𝑃
"
$𝐷!" % . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint where 𝑌!" be the number of people who move from area 𝑖 to area 𝑗 , 𝑘 is a constant term, 𝑃! is the population in area 𝑖 , 𝑃" is the population in area 𝑗 , and 𝐷!" is some measure of distance between 𝑖 and 𝑗 , noting that distance may not be symmetric. The parameters 𝑘 , 𝛼 , 𝛽 , and 𝛾 are estimated by fitting a Poisson model: log / 𝑌!" 0 = 𝑘 + 𝛼$log$(𝑃!)+ 𝛽 log / 𝑃" 0 𝛾$log$(𝐷!"). In addition to the naïve gravity model, we also adjusted for gross provincial product per capita. The best fit according to in-sample error metrics was the adjusted travel time model (SI Appendix, Table S1). We identified outlier observations as those observations with a Cook’s distance greater than five times the mean Cook’s distance. Quantitative methods We evaluated the predictive accuracy of two different types of models: (1) one data- driven network approach built using an L1-regularized regression approach (the least absolute shrinkage and selection operator, LASSO) and (2) autoregressive integrated moving average (ARIMA) models both with and without a seasonal component (SARIMA). In addition, for the mobility-augmented autoregressive models, human mobility is accounted for by also including lagged case data from the top five areas (i.e., origins) of travelers as covariates in the model. We compared both sets of autoregressive models to the network approach predictions using a sliding window of observation and rolling forecast target as described below. Network models Based on a previous model designed to leverage spatially-correlated cases of influenza (Lu et al., 2019), we fit a multivariate linear regression on the log of dengue case counts for area 𝑖 in month 𝑡 with the log of dengue case counts in areas 𝑗 at time 𝑡 − ℎ where is our . CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint forecasting horizon as the covariates. Let 𝑦!&' =ln$(𝑐!&' +1)
where
𝑐
is the count of cases of
location
𝑖
at time
𝑡
:
𝑦!&' = 𝛽(! +
<
𝛽"𝑦" &')*
+
",-
+𝜖.
We used a sliding window of 42 months and h between 1 and 6. All values of
𝑦!&'
were
standardized to be mean-centered with unit variance in order to ensure the coefficients are not
scale-dependent. For all prediction months, there were more areas, 77, or input variables, than
observations, 42, and thus this formulation cannot be solved using an ordinary least squares
(OLS) approach. To address this, we used an
𝐿-
regularization approach to identify a
parsimonious model that uses fewer variables as input than the number of available observations.
This penalization approach acts to both prevent overfitting as well as selecting the most
informative covariates (i.e., provinces). Specifically, we used the least absolute shrinkage and
selection operator, LASSO, which minimizes the same objective function as a regular OLS while
penalizing the number of non-zero coefficients with a hyper-parameter
𝜆
:
min
$1 𝑁 𝑦𝑋𝛽 ! !+𝜆 𝛽 " . where the magnitude of the hyper-parameter 𝜆$
is identified using cross validation on the
training set. This approach shrinks the coefficients of non-informative or redundant areas to zero
and provides for straightforward interpretation of the results allowing for identification of which
areas contributed the most predictive power for any given window of observation and target area.
Autoregressive models
As a baseline for comparing model predictions, we used autoregressive integrated
moving average (ARIMA) models, which is a common time series method applied to
epidemiological modeling and dengue forecasting. These models have been used extensively in
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
dengue prediction efforts and often incorporate a seasonal component called Seasonal ARIMA or
SARIMA. Using the (p, d, q)(P, D, Q)s convention where p indicates the autoregressive order, d
indicates the amount of differencing, and q indicates the order of the moving average. The
seasonal component, (P, Q, D)s, represent the same parameters with a seasonal period of s
months. Additional exogenous variables (i.e., timeseries) can be added as covariates in this
framework.
We reduced the parameter space of the SARIMA models using previous literature
(Johansson et al., 2016) and our expert opinion. Specifically, we systematically search models
with lags of up to four months (p = 1, 2, 3, or 4) or three years (P = 1, 2, or 3) and include a
differencing order up to 1 (d and D = 0 or 1) and exclude all moving averages (q and Q fixed at
0). This results in a set of 15 model parameterizations: eight non-seasonal ARIMAs and seven
seasonal ARIMAs. For each parameterization, we perform a univariate SARIMA as well as a
mobility-augmented SARIMA. The mobility-augmented SARIMA incorporates the timeseries of
cases from the top five connected areas, based on observed mobility, as exogenous covariates.
Similar to the LASSO, we used a sliding window of 42 months, and in the case of augmented
SARIMA models, we lagged the exogenous covariates by
.
We show the feasibility of combining different classes of the above models by using an
ensemble approach we call the “adaptive mosaic model.” For each province and forecasting
horizon, we select the best performing model using a winner-takes-all approach based on the out-
of-sample prediction error of the previous three months. By repeating this procedure for every
prediction month, forecasting horizon, and province, the underlying base model can adapt over
time (Figure 7).
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
Accuracy metrics and model comparison
Consistent with previous research (Lauer et al., 2018; Nicholas G Reich et al., 2016),
when assessing predictive performance of a single model, we used mean absolute error (MAE)
and when assessing the relative performance of two models, we used relative mean absolute
error (relMAE). The MAE of the log transformed counts is as follows:
𝑀𝐴𝐸 =1
𝑇
<
|ln
(
𝑦'+1
)
ln\$(𝑦
E
'+1)|
.
',-
where
𝑦'
and
𝑦'
F are the observed and average counts for prediction month
𝑡
. One
strength of this approach is that the MAE will be the same regardless of magnitude as long as the
ratios are the same (i.e., 100 and 110 for predicted and observed will result in 1.1, just as 10 and
11 or 11 and 10). This is an important feature given the differences in population size and case
counts between provinces.
When comparing model
𝐴
to model
𝐵
at forecast horizon
, we take the ratio of their
MAEs:
𝑟𝑒𝑙𝑀𝐴𝐸/&0&* =𝑀𝐴𝐸/&*
𝑀𝐴𝐸0&*.
To assess the predictive performance of each model, we used retrospective out-of-sample
estimates of the mean absolute error, assuming we only had data prior to the time of estimation
and based on a 42-month sliding window of observation. For example, the 6-month prediction
for June for one year would only include data up to December for the year before and only as far
back as 42 months from that December. This provides 42 months of evaluation data or up to 41
separate models to evaluate prediction error (noting that the number of months available in the
evaluation period is also a function of the prediction horizon). To compare across multiple
models (e.g., to find the model with the best t+1 month forecast in a single province), we used
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
the baseline AR(1) (i.e., ARIMA(1,0,0)(0,0,0) with no exogenous variables) as our referent
model. Thus, the relMAE can be interpreted as the relative under- or over-performance of our
model compared to a standard epidemiological model, averaged over all prediction months.
To assess the utility of call detail records, for each province and forecasting horizon we
selected the best performing model of each class. We then compared the CDR SARIMA to each
other class using a Wilcoxon signed-rank test to compare the out-of-sample prediction errors.
Statistically significant differences are shown in the province-specific reports (SI Appendix, Text
S1) and in Figure S7. Similarly, we compared the proposed mosaic model to a simple AR(1)
using a Wilcoxon signed-rank test (SI Appendix, Fig. S8).
Acknowledgements
RJM and NE were supported by Asian Development Bank TA-8656. The content is
solely the responsibility of the authors and does not necessarily represent the official views of the
funders. MS was partially supported by the National Institute of General Medical Sciences of the
National Institutes of Health under Award Number R01GM130668. The content is solely the
responsibility of the authors and does not necessarily represent the official views of the National
Institutes of Health. CB and MS thank the Harvard Data Science Initiative for their support in
partially funding this collaborative work.
Author Contributions
CB conceptualized the study. MVK, CB, and MS designed the methodology. MVK
conducted all analyses. MVK prepared the original draft. All authors provided critical feedback.
KE-M, NE, DA, and RJM curated the data. MVK, MS, JTC, NK, and CB interpreted the results.
CB and RJM supervised this work. All authors reviewed and approved the submitted manuscript.
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
References
Arino J, Driessche P van den. 2003. A multi-city epidemic model. Math Popul Stud 10:175–193.
doi:10.1080/08898480306720
Baquero OS, Santana LMR, Chiaravalloti-Neto F. 2018. Dengue forecasting in São Paulo city
with generalized additive models, artificial neural networks and seasonal autoregressive
integrated moving average models. Plos One 13:e0195065.
doi:10.1371/journal.pone.0195065
Bhatt S, Gething PW, Brady OJ, Messina JP, Farlow AW, Moyes CL, Drake JM, Brownstein JS,
Hoen AG, Sankoh O, Myers MF, George DB, Jaenisch T, Wint GR, Simmons CP, Scott TW,
Farrar JJ, Hay SI. 2013. The global distribution and burden of dengue. Nature 496:504–7.
doi:10.1038/nature12060
Buczak AL, Baugher B, Moniz LJ, Bagley T, Babin SM, Guven E. 2018. Ensemble method for
dengue prediction. Plos One 13:e0189988. doi:10.1371/journal.pone.0189988
CDC. 2018. CDC Dengue Epidemiology.
Choudhury Z, Banu S, Islam A. 2008. Forecasting dengue incidence in Dhaka, Bangladesh: A
time series analysis.
Chretien J-P, Rivers CM, Johansson MA. 2016. Make Data Sharing Routine to Prepare for
Public Health Emergencies. PLOS Medicine 13:e1002109.
doi:10.1371/journal.pmed.1002109
Cummings DA, Irizarry RA, Huang NE, Endy TP, Nisalak A, Ungchusak K, Burke DS. 2004.
Travelling waves in the occurrence of dengue haemorrhagic fever in Thailand. Nature
427:344–7. doi:10.1038/nature02225
Eastin MD, Delmelle E, Casas I, Wexler J, Self C. 2014. Intra- and Interseasonal Autoregressive
Prediction of Dengue Outbreaks Using Local Weather and Regional Climate for a Tropical
Environment in Colombia. The American Journal of Tropical Medicine and Hygiene 91:598–
610. doi:10.4269/ajtmh.13-0303
Gaughan AE, Stevens FR, Linard C, Jia P, Tatem AJ. 2013. High Resolution Population
Distribution Maps for Southeast Asia in 2010 and 2015. PLoS ONE 8:e55882.
doi:10.1371/journal.pone.0055882
Gharbi M, Quenel P, Gustave J, Cassadou S, Ruche GL, Girdary L, Marrama L. 2011. Time
series analysis of dengue incidence in Guadeloupe, French West Indies: Forecasting models
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
using climate variables as predictors. BMC Infectious Diseases 11:166. doi:10.1186/1471-
2334-11-166
Guzman MG, Harris E. 2015. Dengue. Lancet (London, England) 385:453–65.
doi:10.1016/S0140-6736(14)60572-9
Halstead SB. 2012. Dengue vaccine development: a 75% solution? Lancet (London, England)
380:1535–6. doi:10.1016/S0140-6736(12)61510-4
Hii YL, Zhu H, Ng N, Ng LC, Rocklöv J. 2012. Forecast of dengue incidence using temperature
and rainfall. PLoS neglected tropical diseases 6:e1908. doi:10.1371/journal.pntd.0001908
Hu W, Clements A, Williams G, Tong S. 2010. Dengue fever and El Niño/Southern Oscillation
in Queensland, Australia: a time series predictive model. Occupational and Environmental
Medicine 67:307–311. doi:10.1136/oem.2008.044966
Johansson MA, Reich NG, Hota A, Brownstein JS, Santillana M. 2016. Evaluating the
performance of infectious disease forecasts: A comparison of climate-driven and seasonal
dengue forecasts for Mexico. Scientific Reports 6:33707. doi:10.1038/srep33707
Lauer SA, Sakrejda K, Ray EL, Keegan LT, Bi Q, Suangtho P, Hinjoy S, Iamsirithaworn S,
Suthachana S, Laosiritaworn Y, Cummings DAT, Lessler J, Reich NG. 2018. Prospective
forecasts of annual dengue hemorrhagic fever incidence in Thailand, 2010–2014. Proceedings
of the National Academy of Sciences 115:201714457. doi:10.1073/pnas.1714457115
Lee S, Castillo-Chavez C. 2015. The role of residence times in two-patch dengue transmission
dynamics and optimal strategies. J Theor Biol 374:152–164. doi:10.1016/j.jtbi.2015.03.005
Lewer JJ, Berg HV den. 2008. A gravity model of immigration. Economics Letters 99:164–167.
doi:10.1016/j.econlet.2007.06.019
Limkittikul K, Brett J, L’Azou M. 2014. Epidemiological Trends of Dengue Disease in Thailand
(2000–2011): A Systematic Literature Review. PLoS Neglected Tropical Diseases 8:e3241.
doi:10.1371/journal.pntd.0003241
Liu K, Sun J, Liu X, Li R, Wang Y, Lu L, Wu H, Gao Y, Xu L, Liu Q. 2018. Spatiotemporal
patterns and determinants of dengue at county level in China from 2005-2017. Int J Infect Dis
77:96–104. doi:10.1016/j.ijid.2018.09.003
Lloyd AL, Jansen VAA. 2004. Spatiotemporal dynamics of epidemics: synchrony in
metapopulation models. Math Biosci 188:1–16. doi:10.1016/j.mbs.2003.09.003
Lourenço J, Recker M. 2013. Natural, Persistent Oscillations in a Spatial Multi-Strain Disease
System with Application to Dengue. Plos Comput Biol 9:e1003308.
doi:10.1371/journal.pcbi.1003308
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
Lu FS, Hattab MW, Clemente CL, Biggerstaff M, Santillana M. 2019. Improved state-level
influenza nowcasting in the United States leveraging Internet-based data and network
approaches. Nat Commun 10:147. doi:10.1038/s41467-018-08082-0
Luxen D, Vetter C. 2011. Real-time routing with OpenStreetMap data. Proceedings of the 19th
ACM SIGSPATIAL International Conference on Advances in Geographic Information
Systems. doi:10.1145/2093973.2094062
Luz PM, Mendes BV, Codeço CT, Struchiner CJ, Galvani AP. 2008. Time series analysis of
dengue incidence in Rio de Janeiro, Brazil. The American journal of tropical medicine and
hygiene 79:933–9.
Martinez EZ, Silva EAA, Fabbro AL. 2011. A SARIMA forecasting model to predict the number
of cases of dengue in Campinas, State of São Paulo, Brazil. Revista da Sociedade Brasileira
de Medicina Tropical 44:436–40.
NESDB. 2017. Gross Regional and Provincial Product Chain Volume Measures 2015 Edition.
National Economic and Social Development Board of Thailand.
Olliaro P, Fouque F, Kroeger A, Bowman L, Velayudhan R, Santelli AC, Garcia D, Ramm RS,
Sulaiman LH, Tejeda GS, Morales FC, Gozzer E, Garrido CB, Quang LC, Gutierrez G,
Yadon ZE, Runge-Ranzinger S. 2018. Improved tools and strategies for the prevention and
control of arboviral diseases: A research-to-policy forum. PLOS Neglected Tropical Diseases
12:e0005967. doi:10.1371/journal.pntd.0005967
Panhuis WG van, Choisy M, Xiong X, Chok NS, Akarasewi P, Iamsirithaworn S, Lam SK,
Chong CK, Lam FC, Phommasak B, Vongphrachanh P, Bouaphanh K, Rekol H, Hien NT,
Thai PQ, Duong TN, Chuang J-H, Liu Y-L, Ng L-C, Shi Y, Tayag EA, Roque VG, Suy LLL,
Jarman RG, Gibbons RV, Velasco JMS, Yoon I-K, Burke DS, Cummings DAT. 2015.
Region-wide synchrony and traveling waves of dengue across eight countries in Southeast
Asia. Proceedings of the National Academy of Sciences 112:13069–13074.
doi:10.1073/pnas.1501375112
Promprou S, Jaroensutasinee M, Jaroensutasinee K. 2006. Forecasting Dengue Haemorrhagic
Fever Cases in Southern Thailand using ARIMA Models.
Ray EL, Sakrejda K, Lauer SA, Johansson MA, Reich NG. 2017. Infectious disease prediction
with kernel conditional density estimation. Statistics in Medicine 36:4908–4929.
doi:10.1002/sim.7488
Reich Nicholas G., Lauer SA, Sakrejda K, Iamsirithaworn S, Hinjoy S, Suangtho P, Suthachana
S, Clapham HE, Salje H, Cummings DAT, Lessler J. 2016. Challenges in Real-Time
Prediction of Infectious Disease: A Case Study of Dengue in Thailand. PLOS Neglected
Tropical Diseases 10:e0004761. doi:10.1371/journal.pntd.0004761
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
Reich Nicholas G, Lessler J, Sakrejda K, Lauer SA, Iamsirithaworn S, Cummings DAT. 2016.
Case Study in Evaluating Time Series Prediction Models Using the Relative Mean Absolute
Error. The American Statistician 70:285–292. doi:10.1080/00031305.2016.1148631
Salje H, Lessler J, Berry IM, Melendrez MC, Endy T, Kalayanarooj S, A-Nuegoonpipat A,
Chanama S, Sangkijporn S, Klungthong C, Thaisomboonsuk B, Nisalak A, Gibbons RV,
Iamsirithaworn S, Macareo LR, Yoon I-K, Sangarsang A, Jarman RG, Cummings DAT.
2017. Dengue diversity across spatial and temporal scales: Local structure and the effect of
host population size. Science 355:1302–1306. doi:10.1126/science.aaj9384
Salje H, Lessler J, Endy TP, Curriero FC, Gibbons RV, Nisalak A, Nimmannitya S,
Kalayanarooj S, Jarman RG, Thomas SJ, Burke DS, Cummings DAT. 2012. Revealing the
microscale spatial signature of dengue transmission and immunity in an urban population.
Proceedings of the National Academy of Sciences 109:9535–9538.
doi:10.1073/pnas.1120621109
Scarpino SV, Meyers LA, Johansson MA. 2017. Design Strategies for Efficient Arbovirus
Surveillance. Emerging Infectious Diseases 23:642–644. doi:10.3201/eid2304.160944
Stanaway JD, Shepard DS, Undurraga EA, Halasa YA, Coffeng LE, Brady OJ, Hay SI, Bedi N,
Bensenor IM, Castañeda-Orjuela CA. 2016. The global burden of dengue: an analysis from
the Global Burden of Disease Study 2013. The Lancet infectious diseases 16:712–723.
Stolerman L, Maia P, Kutz JN. 2016. Data-Driven Forecast of Dengue Outbreaks in Brazil: A
Critical Assessment of Climate Conditions for Different Capitals. arXiv.
Stolerman LM, Coombs D, Boatto S. 2015. SIR-Network Model and Its Application to Dengue
Fever. Siam J Appl Math 75:2581–2609. doi:10.1137/140996148
Tatem AJ, Hay SI, Rogers DJ. 2006. Global traffic and disease vector dispersal. Proceedings of
the National Academy of Sciences of the United States of America 103:6242–7.
doi:10.1073/pnas.0508391103
Tinbergen J. 1963. Shaping the world economy. The Economic Journal 5.
doi:10.1002/tie.5060050113
Wesolowski A, O’Meara WP, Eagle N, Tatem AJ, Buckee CO. 2015a. Evaluating Spatial
Interaction Models for Regional Mobility in Sub-Saharan Africa. Plos Comput Biol
11:e1004267. doi:10.1371/journal.pcbi.1004267
Wesolowski A, Qureshi T, Boni MF, Sundsøy PR, Johansson MA, Rasheed SB, Engø-Monsen
K, Buckee CO. 2015b. Impact of human mobility on the emergence of dengue epidemics in
Pakistan. Proceedings of the National Academy of Sciences 112:11887–11892.
doi:10.1073/pnas.1504964112
WHO. 2018a. Dengue Fact Sheet.
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
WHO. 2018b. Dengue vaccine: WHO position paper - September 2018. World Health
Organization.
WHO. 2012. Global stretegy for dengue prevention and control 2012-2020. World Health
Organization.
Yamana TK, Kandula S, Shaman J. 2016. Superensemble forecasts of dengue outbreaks. Journal
of The Royal Society Interface 13:20160410. doi:10.1098/rsif.2016.0410
Yang S, Kou SC, Lu F, Brownstein JS, Brooke N, Santillana M. 2017. Advances in using
Internet searches to track dengue. PLOS Computational Biology 13:e1005607.
doi:10.1371/journal.pcbi.1005607
Zhu G, Liu J, Tan Q, Shi B. 2016. Inferring the Spatio-temporal Patterns of Dengue
Transmission from Surveillance Data in Guangzhou, China. PLOS Neglected Tropical
Diseases 10:e0004633. doi:10.1371/journal.pntd.0004633
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 28, 2020. .https://doi.org/10.1101/2020.07.22.20157966doi: medRxiv preprint
... The restrictions on people's displacement and mobility created a favorable situation to test the study's hypothesis. Even if simple modeling only partially covers complex problems such as the occurrence of dengue, advancing the understanding of the role of mobility in the epidemiology of this condition, made possible by the present study, should be a goal to be pursued [48]. Another strength of the study refers to the use of adequate statistical modeling to assess the relationship between dengue and mobility restriction, which takes into account the influence of seasonal factors inherent to the studied phenomenon, as well as lags between isolation and the occurrence of cases. ...
Article
Background Studies have shown that human mobility is an important factor in dengue epidemiology. Changes in mobility resulting from COVID-19 pandemic set up a real-life situation to test this hypothesis. Our objective was to evaluate the effect of reduced mobility due to this pandemic in the occurrence of dengue in the state of São Paulo, Brazil. Method It is an ecological study of time series, developed between January and August 2020. We use the number of confirmed dengue cases and residential mobility, on a daily basis, from secondary information sources. Mobility was represented by the daily percentage variation of residential population isolation, obtained from the Google database. We modeled the relationship between dengue occurrence and social distancing by negative binomial regression, adjusted for seasonality. We represent the social distancing dichotomously (isolation versus no isolation) and consider lag for isolation from the dates of occurrence of dengue. Results The risk of dengue decreased around 9.1% (95% CI: 14.2 to 3.7) in the presence of isolation, considering a delay of 20 days between the degree of isolation and the dengue first symptoms. Conclusions We have shown that mobility can play an important role in the epidemiology of dengue and should be considered in surveillance and control activities.
... Another point that deserves to be highlighted is that the reduction in the registration of dengue cases during the quarantine period may have been influenced by underreporting, as suggested by others [36]. should be a goal to be pursued [37]. Another strength of the study refers to the use of adequate statistical modeling to assess the relationship between dengue and mobility restriction, which takes into account the influence of seasonal factors inherent to the studied phenomenon, as well as lags between isolation and the occurrence of cases. ...
Preprint
Full-text available
Background: Studies have shown that human mobility is an important factor in dengue epidemiology. Changes in mobility resulting from COVID-19 pandemic set up a real-life situation to test this hypothesis. Our objective was to evaluate the effect of reduced mobility due to this pandemic in the occurrence of dengue in the state of S\~ao Paulo, Brazil. Method: It is an ecological study of time series, developed between January and August 2020. We use the number of confirmed dengue cases and residential mobility, on a daily basis, from secondary information sources. Mobility was represented by the daily percentage variation of residential population isolation, obtained from the Google database. We modeled the relationship between dengue occurrence and social distancing by negative binomial regression, adjusted for seasonality. We represent the social distancing dichotomously (isolation versus no isolation) and consider lag for isolation from the dates of occurrence of dengue. Results: The risk of dengue decreased around 9.1% (95% CI: 14.2 to 3.7) in the presence of isolation, considering a delay of 20 days between the degree of isolation and the dengue first symptoms. Conclusions: We have shown that mobility can play an important role in the epidemiology of dengue and should be considered in surveillance and control activities
Article
Full-text available
In the presence of health threats, precision public health approaches aim to provide targeted, timely, and population-specific interventions. Accurate surveillance methodologies that can estimate infectious disease activity ahead of official healthcare-based reports, at relevant spatial resolutions, are important for achieving this goal. Here we introduce a methodological framework which dynamically combines two distinct influenza tracking techniques, using an ensemble machine learning approach, to achieve improved state-level influenza activity estimates in the United States. The two predictive techniques behind the ensemble utilize (1) a self-correcting statistical method combining influenza-related Google search frequencies, information from electronic health records, and historical flu trends within each state, and (2) a network-based approach leveraging spatio-temporal synchronicities observed in historical influenza activity across states. The ensemble considerably outperforms each component method in addition to previously proposed state-specific methods for influenza tracking, with higher correlations and lower prediction errors.
Article
Full-text available
Objective: To identify the high risk spatiotemporal clusters of dengue cases and explore the associated risk factors. Methods: Monthly indigenous dengue cases in 2005-2017 were aggregated at county level. Spatiotemporal cluster analysis was used to explore dengue distribution features using SaTScan9.4.4 and Arcgis10.3.0. In addition, the influential factors and potential high risk areas of dengue outbreaks were analyzed using Ecological niche models in Maxent 3.3.1 software. Results: We found a heterogeneous spatial and temporal distribution pattern of dengue cases. The identified high risk region in the primary cluster covered 13 counties in Guangdong Province and in the secondary clusters included 14 counties in Yunnan Province. Additionally, there was a nonlinear association between meteorological and environmental factors and dengue outbreaks, with 8.5%∼57.1%, 6.7%∼38.3% and 3.2%∼40.4% contribution from annual average minimum temperature, land cover and annual average precipitation, respectively. Conclusions: The high risk areas of dengue outbreaks mainly locate in Guangdong and Yunnan Province, which is significantly shaped by environmental and meteorological factors, such as temperature, precipitation and landcover.
Article
Full-text available
Globally, the number of dengue cases has been on the increase since 1990 and this trend has also been found in Brazil and its most populated city-São Paulo. Surveillance systems based on predictions allow for timely decision making processes, and in turn, timely and efficient interventions to reduce the burden of the disease. We conducted a comparative study of dengue predictions in São Paulo city to test the performance of trained seasonal autoregressive integrated moving average models, generalized additive models and artificial neural networks. We also used a naïve model as a benchmark. A generalized additive model with lags of the number of cases and meteorological variables had the best performance, predicted epidemics of unprecedented magnitude and its performance was 3.16 times higher than the benchmark and 1.47 higher that the next best performing model. The predictive models captured the seasonal patterns but differed in their capacity to anticipate large epidemics and all outperformed the benchmark. In addition to be able to predict epidemics of unprecedented magnitude, the best model had computational advantages, since its training and tuning was straightforward and required seconds or at most few minutes. These are desired characteristics to provide timely results for decision makers. However, it should be noted that predictions are made just one month ahead and this is a limitation that future studies could try to reduce.
Article
Full-text available
Dengue hemorrhagic fever (DHF), a severe manifestation of dengue viral infection that can cause severe bleeding, organ impairment, and even death, affects between 15,000 and 105,000 people each year in Thailand. While all Thai provinces experience at least one DHF case most years, the distribution of cases shifts regionally from year to year. Accurately forecasting where DHF outbreaks occur before the dengue season could help public health officials prioritize public health activities. We develop statistical models that use biologically plausible covariates, observed by April each year, to forecast the cumulative DHF incidence for the remainder of the year. We perform cross-validation during the training phase (2000-2009) to select the covariates for these models. A parsimonious model based on preseason incidence outperforms the 10-y median for 65% of province-level annual forecasts, reduces the mean absolute error by 19%, and successfully forecasts outbreaks (area under the receiver operating characteristic curve = 0.84) over the testing period (2010-2014). We find that functions of past incidence contribute most strongly to model performance, whereas the importance of environmental covariates varies regionally. This work illustrates that accurate forecasts of dengue risk are possible in a policy-relevant timeframe.
Article
Full-text available
Background Research has been conducted on interventions to control dengue transmission and respond to outbreaks. A summary of the available evidence will help inform disease control policy decisions and research directions, both for dengue and, more broadly, for all Aedes-borne arboviral diseases. Method A research-to-policy forum was convened by TDR, the Special Programme for Research and Training in Tropical Diseases, with researchers and representatives from ministries of health, in order to review research findings and discuss their implications for policy and research. Results The participants reviewed findings of research supported by TDR and others. Surveillance and early outbreak warning. Systematic reviews and country studies identify the critical characteristics that an alert system should have to document trends reliably and trigger timely responses (i.e., early enough to prevent the epidemic spread of the virus) to dengue outbreaks. A range of variables that, according to the literature, either indicate risk of forthcoming dengue transmission or predict dengue outbreaks were tested and some of them could be successfully applied in an Early Warning and Response System (EWARS). Entomological surveillance and vector management. A summary of the published literature shows that controlling Aedes vectors requires complex interventions and points to the need for more rigorous, standardised study designs, with disease reduction as the primary outcome to be measured. House screening and targeted vector interventions are promising vector management approaches. Sampling vector populations, both for surveillance purposes and evaluation of control activities, is usually conducted in an unsystematic way, limiting the potentials of entomological surveillance for outbreak prediction. Combining outbreak alert and improved approaches of vector management will help to overcome the present uncertainties about major risk groups or areas where outbreak response should be initiated and where resources for vector management should be allocated during the interepidemic period. Conclusions The Forum concluded that the evidence collected can inform policy decisions, but also that important research gaps have yet to be filled.
Article
Full-text available
Background In the 2015 NOAA Dengue Challenge, participants made three dengue target predictions for two locations (Iquitos, Peru, and San Juan, Puerto Rico) during four dengue seasons: 1) peak height (i.e., maximum weekly number of cases during a transmission season; 2) peak week (i.e., week in which the maximum weekly number of cases occurred); and 3) total number of cases reported during a transmission season. A dengue transmission season is the 12-month period commencing with the location-specific, historical week with the lowest number of cases. At the beginning of the Dengue Challenge, participants were provided with the same input data for developing the models, with the prediction testing data provided at a later date. Methods Our approach used ensemble models created by combining three disparate types of component models: 1) two-dimensional Method of Analogues models incorporating both dengue and climate data; 2) additive seasonal Holt-Winters models with and without wavelet smoothing; and 3) simple historical models. Of the individual component models created, those with the best performance on the prior four years of data were incorporated into the ensemble models. There were separate ensembles for predicting each of the three targets at each of the two locations. Principal findings Our ensemble models scored higher for peak height and total dengue case counts reported in a transmission season for Iquitos than all other models submitted to the Dengue Challenge. However, the ensemble models did not do nearly as well when predicting the peak week. Conclusions The Dengue Challenge organizers scored the dengue predictions of the Challenge participant groups. Our ensemble approach was the best in predicting the total number of dengue cases reported for transmission season and peak height for Iquitos, Peru.
Article
Full-text available
As public health agencies struggle to track and contain emerging arbovirus threats, timely and efficient surveillance is more critical than ever. Using historical dengue data from Puerto Rico, we developed methods for streamlining and designing novel arbovirus surveillance systems with or without historical disease data.
Article
Full-text available
Local climate conditions play a major role in the development of the mosquito population responsible for transmitting Dengue Fever. Since the {\em Aedes Aegypti} mosquito is also a primary vector for the recent Zika and Chikungunya epidemics across the Americas, a detailed monitoring of periods with favorable climate conditions for mosquito profusion may improve the timing of vector-control efforts and other urgent public health strategies. We apply dimensionality reduction techniques and machine-learning algorithms to climate time series data and analyze their connection to the occurrence of Dengue outbreaks for seven major cities in Brazil. Specifically, we have identified two key variables and a period during the annual cycle that are highly predictive of epidemic outbreaks. The key variables are the frequency of precipitation and temperature during an approximately two month window of the winter season preceding the outbreak. Thus simple climate signatures may be influencing Dengue outbreaks even months before their occurrence. Some of the more challenging datasets required usage of compressive-sensing procedures to estimate missing entries for temperature and precipitation records. Our results indicate that each Brazilian capital considered has a unique frequency of precipitation and temperature signature in the winter preceding a Dengue outbreak. Such climate contributions on vector populations are key factors in dengue dynamics which could lead to more accurate prediction models and early warning systems. Finally, we show that critical temperature and precipitation signatures may vary significantly from city to city, suggesting that the interplay between climate variables and dengue outbreaks is more complex than generally appreciated.
Article
Creating statistical models that generate accurate predictions of infectious disease incidence is a challenging problem whose solution could benefit public health decision makers. We develop a new approach to this problem using kernel conditional density estimation (KCDE) and copulas. We obtain predictive distributions for incidence in individual weeks using KCDE and tie those distributions together into joint distributions using copulas. This strategy enables us to create predictions for the timing of and incidence in the peak week of the season. Our implementation of KCDE incorporates 2 novel kernel components: a periodic component that captures seasonality in disease incidence and a component that allows for a full parameterization of the bandwidth matrix with discrete variables. We demonstrate via simulation that a fully parameterized bandwidth matrix can be beneficial for estimating conditional densities. We apply the method to predicting dengue fever and influenza and compare to a seasonal autoregressive integrated moving average model and HHH4, a previously published extension to the generalized linear model framework developed for infectious disease incidence. The KCDE outperforms the baseline methods for predictions of dengue incidence in individual weeks. The KCDE also offers more consistent performance than the baseline models for predictions of incidence in the peak week and is comparable to the baseline models on the other prediction targets. Using the periodic kernel function led to better predictions of incidence. Our approach and extensions of it could yield improved predictions for public health decision makers, particularly in diseases with heterogeneous seasonal dynamics such as dengue fever.
Article
A fundamental mystery for dengue and other infectious pathogens is how observed patterns of cases relate to actual chains of individual transmission events. These pathways are intimately tied to the mechanisms by which strains interact and compete across spatial scales. Phylogeographic methods have been used to characterize pathogen dispersal at global and regional scales but have yielded few insights into the local spatiotemporal structure of endemic transmission. Using geolocated genotype (800 cases) and serotype (17,291 cases) data, we show that in Bangkok, Thailand, 60% of dengue cases living <200 meters apart come from the same transmission chain, as opposed to 3% of cases separated by 1 to 5 kilometers. At distances <200 meters from a case (encompassing an average of 1300 people in Bangkok), the effective number of chains is 1.7. This number rises by a factor of 7 for each 10-fold increase in the population of the “enclosed” region. This trend is observed regardless of whether population density or area increases, though increases in density over 7000 people per square kilometer do not lead to additional chains. Within Thailand these chains quickly mix, and by the next dengue season viral lineages are no longer highly spatially structured within the country. In contrast, viral flow to neighboring countries is limited. These findings are consistent with local, density-dependent transmission and implicate densely populated communities as key sources of viral diversity, with home location the focal point of transmission. These findings have important implications for targeted vector control and active surveillance.