Datasets for training and testing ML algorithms. They include different periods with different spatio- temporal characteristics.

Datasets for training and testing ML algorithms. They include different periods with different spatio- temporal characteristics.

Source publication
Article
Full-text available
We are witnessing the dramatic consequences of the COVID-19 pandemic which, unfortunately, go beyond the impact on the health system. Until herd immunity is achieved with vaccines, the only available mechanisms for controlling the pandemic are quarantines, perimeter closures and social distancing with the aim of reducing mobility. Governments only...

Contexts in source publication

Context 1
... then, the count of COVID-19 cases has been kept uniform, with slight changes and updates. Table 1 shows the two different periods under study that are translated into two different datasets. For each period, a train and test datasets have been designed to assess the different trend changes as indicated in the Table 1. ...
Context 2
... 1 shows the two different periods under study that are translated into two different datasets. For each period, a train and test datasets have been designed to assess the different trend changes as indicated in the Table 1. Particularly, the first dataset (DS1) includes the information from July 20, 2020 to December 4, 2020. ...
Context 3
... p-value corresponding to the t-statistic of each coefficient indicates if there is a significant relationship between the response variable (14-day CI) and each of the predictors included in the model (ensemble and mobility variables). Table 10 shows the results obtained by the NLM method for the different seed values previously described in "Methods" section, i.e. the MAE and the number of iterations performed by the procedure in each case. It is important to highlight that when the seed of NLM is the coefficients randomly generated from a uniform distribution from -10 to 10, the NLM algorithm is executed 10 times and the MAE and number of iterations in Table 10 are calculated as the average over 10 simulation runs. ...
Context 4
... 10 shows the results obtained by the NLM method for the different seed values previously described in "Methods" section, i.e. the MAE and the number of iterations performed by the procedure in each case. It is important to highlight that when the seed of NLM is the coefficients randomly generated from a uniform distribution from -10 to 10, the NLM algorithm is executed 10 times and the MAE and number of iterations in Table 10 are calculated as the average over 10 simulation runs. As can be seen, the best result is reached by performing 36 iterations of the www.nature.com/scientificreports/ ...
Context 5
... the MAE has been minimized, Table 11 presents 14-day CI predictions for an evaluation period from 5th to 18th of December using the multivariate model with the optimal coefficient values obtained by NLM for the minimum MAE. It is important to remark that if exogenous variables are not extended, 14-day CI forecasts are restricted to a five-period prediction horizon. ...

Similar publications

Article
Full-text available
Understanding why people join, stay, or leave social groups is a central question in the social sciences, including computational social systems, while modeling these processes is a challenge in complex networks. Yet, the current empirical studies rarely focus on group dynamics for lack of data relating opinions to group membership. In the NetSense...
Chapter
Full-text available
Seit der Einführung des Computers als Forschungs-, Experimentier- und Prognoseinstrument erleben die Wissenschaften einen tief greifenden Wandel. Nicht nur die Praktiken und Infrastrukturen wissenschaftlichen Arbeitens verändern sich, sondern auch die Logik der Forschung unterliegt einer grundlegenden Transformation. Neben Theorie, Experiment und M...
Preprint
Full-text available
Elliptic partial differential equations (PDEs) arise in many areas of computational sciences such as computational fluid dynamics, biophysics, engineering, geophysics and more. They are difficult to solve due to their global nature and sometimes ill-conditioned operators. We review common discretization methods for elliptic PDEs such as the finite...
Article
Full-text available
A plethora of applications from Computational Sciences can be identified for a system of nonlinear equations in an abstract space. These equations are mostly solved with an iterative method because an analytical method does not exist for such problems. The convergence of the method is established by sufficient conditions. Recently, there has been a...

Citations

... A review of the ARIMA model literature through February 4, 2023, suggests that only human mobility has been used to predict COVID-19 transmission before universal vaccination implementation in such models [42][43][44][45][46], and that vaccination was included as a predictor in only one study [47]. With increasing vaccination coverage and SARS-CoV-2 evolution, a comprehensive set of factors should be examined and vaccination and other protective behaviors should be included as predictors to improve dynamic forecasting. ...
... The COVID-19 case growth rate, rather than the number of new cases per million, was selected because it better reflects epidemic trends and meets the stationary condition for time-dependent trends, thereby enabling more accurate prediction [42,43]. Furthermore, the variance of the series is stabilized by this taking of a logarithmic approach [63]. ...
Article
Full-text available
Background Mathematical and statistical models are used to predict trends in epidemic spread and determine the effectiveness of control measures. Automatic regressive integrated moving average (ARIMA) models are used for time-series forecasting, but only few models of the 2019 coronavirus disease (COVID-19) pandemic have incorporated protective behaviors or vaccination, known to be effective for pandemic control. Methods To improve the accuracy of prediction, we applied newly developed ARIMA models with predictors (mask wearing, avoiding going out, and vaccination) to forecast weekly COVID-19 case growth rates in Canada, France, Italy, and Israel between January 2021 and March 2022. The open-source data was sourced from the YouGov survey and Our World in Data. Prediction performance was evaluated using the root mean square error (RMSE) and the corrected Akaike information criterion (AICc). Results A model with mask wearing and vaccination variables performed best for the pandemic period in which the Alpha and Delta viral variants were predominant (before November 2021). A model using only past case growth rates as autoregressive predictors performed best for the Omicron period (after December 2021). The models suggested that protective behaviors and vaccination are associated with the reduction of COVID-19 case growth rates, with booster vaccine coverage playing a particularly vital role during the Omicron period. For example, each unit increase in mask wearing and avoiding going out significantly reduced the case growth rate during the Alpha/Delta period in Canada (–0.81 and –0.54, respectively; both p < 0.05). In the Omicron period, each unit increase in the number of booster doses resulted in a significant reduction of the case growth rate in Canada (–0.03), Israel (–0.12), Italy (–0.02), and France (–0.03); all p < 0.05. Conclusions The key findings of this study are incorporating behavior and vaccination as predictors led to accurate predictions and highlighted their significant role in controlling the pandemic. These models are easily interpretable and can be embedded in a “real-time” schedule with weekly data updates. They can support timely decision making about policies to control dynamically changing epidemics.
... Previous research on the topic of enhancing the accuracy of COVID-19 prediction by integrating epidemiological and mobility data (García-Cremades et al., 2021) aimed to assess various models that can be used to make early predictions about the progression of the COVID-19 outbreak, with the ultimate goal of developing a decision support system to aid policy-makers. For example, spatiotemporal disease models and graph neural networks were integrated to enhance the forecasting accuracy of weekly COVID-19 cases in Germany (Fritz et al., 2022). ...
Article
Full-text available
Research background: The COVID-19 pandemic has caused unprecedented disruptions to the global tourism industry, resulting in significant impacts on both human and economic activities. Travel restrictions, border closures, and quarantine measures have led to a sharp decline in tourism demand, causing businesses to shut down, jobs to be lost, and economies to suffer. Purpose of the article: This study aims to examine the correlation and causal relationship between real-time mobility data and statistical data on tourism, specifically tourism overnights, across eleven European countries during the first 14 months of the pandemic. We analyzed the short longitudinal connections between two dimensions of tourism and related activities. Methods: Our method is to use Google and Apple's observational data to link with tourism statistical data, enabling the development of early predictive models and econometric models for tourism overnights (or other tourism indices). This approach leverages the more timely and more reliable mobility data from Google and Apple, which is published with less delay than tourism statistical data. Findings & value added: Our findings indicate statistically significant correlations between specific mobility dimensions, such as recreation and retail, parks, and tourism statistical data, but poor or insignificant relations with workplace and transit dimensions. We have identified that leisure and recreation have a much stronger influence on tourism than the domestic and routine-named dimensions. Additionally, our neural network analysis revealed that Google Mobility Parks and Google Mobility Retail & Recreation are the best predictors for tourism, while Apple Driving and Apple Walking also show significant correlations with tourism data. The main added value of our research is that it combines observational data with statistical data, demonstrates that Google and Apple location data can be used to model tourism phenomena, and identifies specific methods to determine the extent, direction, and intensity of the relationship between mobility and tourism flows
... Although literature has documented the direct relationship between mobility and number of cases using simulation studies and analysis of aggregated mobility data [36][37][38], community quarantine alone was shown to be ineffective in curtailing the spread of infection and brought devastating economic and societal impact on different populations [39]. As the governments design their respective lockdown exit strategy, it is crucial to maintain the awareness of the population about the persistence of the threat due to the virus and sustain its interest in disease prevention and control measures [40]. ...
Article
Full-text available
Background: Traditional surveillance systems rely on routine collection of data. The inherent delay in retrieval and analysis of data leads to reactionary rather than preventive measures. Forecasting and analysis of behavior-related data can supplement the information from traditional surveillance systems. Objective: We assessed the use of behavioral indicators, such as the general public's interest in the risk of contracting SARS-CoV-2 and changes in their mobility, in building a vector autoregression model for forecasting and analysis of the relationships of these indicators with the number of COVID-19 cases in the National Capital Region. Methods: An etiologic, time-trend, ecologic study design was used to forecast the daily number of cases in 3 periods during the resurgence of COVID-19. We determined the lag length by combining knowledge on the epidemiology of SARS-CoV-2 and information criteria measures. We fitted 2 models to the training data set and computed their out-of-sample forecasts. Model 1 contains changes in mobility and number of cases with a dummy variable for the day of the week, while model 2 also includes the general public's interest. The forecast accuracy of the models was compared using mean absolute percentage error. Granger causality test was performed to determine whether changes in mobility and public's interest improved the prediction of cases. We tested the assumptions of the model through the Augmented Dickey-Fuller test, Lagrange multiplier test, and assessment of the moduli of eigenvalues. Results: A vector autoregression (8) model was fitted to the training data as the information criteria measures suggest the appropriateness of 8. Both models generated forecasts with similar trends to the actual number of cases during the forecast period of August 11-18 and September 15-22. However, the difference in the performance of the 2 models became substantial from January 28 to February 4, as the accuracy of model 2 remained within reasonable limits (mean absolute percentage error [MAPE]=21.4%) while model 1 became inaccurate (MAPE=74.2%). The results of the Granger causality test suggest that the relationship of public interest with number of cases changed over time. During the forecast period of August 11-18, only change in mobility (P=.002) improved the forecasting of cases, while public interest was also found to Granger-cause the number of cases during September 15-22 (P=.001) and January 28 to February 4 (P=.003). Conclusions: To the best of our knowledge, this is the first study that forecasted the number of COVID-19 cases and explored the relationship of behavioral indicators with the number of COVID-19 cases in the Philippines. The resemblance of the forecasts from model 2 with the actual data suggests its potential in providing information about future contingencies. Granger causality also implies the importance of examining changes in mobility and public interest for surveillance purposes.
... In [17], researchers found a consistent pattern of a sharp reduction in deaths after mobility is reduced. Other groups implemented prediction models [19][20][21][22][23][24][25] to estimate the effects of mobility reduction and predict the number of cases and deaths. These models were implemented with varying levels of complexity; for instance, [19,20] added additional variables, including (in [19]) meteorological variables, such as temperature, humidity, and rainfall, along with the correlation between mobility and COVID-19 case counts. ...
Article
Full-text available
Human mobility plays an important role in the spread of COVID-19. Given this knowledge, countries implemented mobility-restricting policies. Concomitantly, as the pandemic progressed, population resistance to the virus increased via natural immunity and vaccination. We address the question: “What is the impact of mobility-restricting measures on a resistant population?” We consider two factors: different types of points of interest (POIs)—including transit stations, groceries and pharmacies, retail and recreation, workplaces, and parks—and the emergence of the Delta variant. We studied a group of 14 countries and estimated COVID-19 transmission based on the type of POI, the fraction of population resistance, and the presence of the Delta variant using a Pearson correlation between mobility and the growth rate of cases. We find that retail and recreation venues, transit stations, and workplaces are the POIs that benefit the most from mobility restrictions, mainly if the fraction of the population with resistance is below 25–30%. Groceries and pharmacies may benefit from mobility restrictions when the population resistance fraction is low, whereas in parks, there is little advantage to mobility-restricting measures. These results are consistent for both the original strain and the Delta variant; Omicron data were not included in this work.
... For example, contact tracing can help in predicting the evolution of the COVID-19 infections so that Fig. 4 The course of the hospital COVID-19 caseload in Kuwait if the lockdown starts 5, 10, or 15 days before the peak and lasts for 15, 30, or 45 days. The uncertainty is shown in the gray shaded areas, while the solid black curve shows the mean of the simulation results predictions of the peak of the epidemic becomes easier [25]. ...
Article
Full-text available
Background Kuwait had its first COVID-19 in late February, and until October 6, 2020 it recorded 108,268 cases and 632 deaths. Despite implementing one of the strictest control measures-including a three-week complete lockdown, there was no sign of a declining epidemic curve. The objective of the current analyses is to determine, hypothetically, the optimal timing and duration of a full lockdown in Kuwait that would result in controlling new infections and lead to a substantial reduction in case hospitalizations. Methods The analysis was conducted using a stochastic Continuous-Time Markov Chain (CTMC), eight state model that depicts the disease transmission and spread of SARS-CoV 2. Transmission of infection occurs between individuals through social contacts at home, in schools, at work, and during other communal activities. Results The model shows that a lockdown 10 days before the epidemic peak for 90 days is optimal but a more realistic duration of 45 days can achieve about a 45% reduction in both new infections and case hospitalizations. Conclusions In the view of the forthcoming waves of the COVID19 pandemic anticipated in Kuwait using a correctly-timed and sufficiently long lockdown represents a workable management strategy that encompasses the most stringent form of social distancing with the ability to significantly reduce transmissions and hospitalizations.
... 03/01/2020-06/02/2020). Google data were widely used in previous studies to evaluate the reduction in the movement of people during the COVID-19 pandemic (Siqueira et al., 2020), and the prediction of COVID-19 spread was improved by fusing epidemiological and mobility data (Al Zobbi et al., 2020;García-Cremades et al., 2021;Sulyok and Walker, 2020). ...
Article
Many countries imposed lockdown (LD) to limit the spread of COVID-19, which led to a reduction in the emission of anthropogenic atmospheric pollutants. Several studies have investigated the effects of LD on air quality, mostly in urban settings and criteria pollutants. However, less information is available on background sites, and virtually no information is available on particle number size distribution (PNSD). This study investigated the effect of LD on air quality at an urban background site representing a near coast area in the central Mediterranean. The analysis focused on equivalent black carbon (eBC), particle mass concentrations in different size fractions: PM2.5 (aerodynamic diameter Da < 2.5 μm), PM10 (Da < 10 μm), PM10-2.5 (2.5 < Da < 10 μm); and PNSD in a wide range of diameters (0.01-10 μm). Measurements in 2020 during the national LD in Italy and period immediately after LD (POST-LD period) were compared with those in the corresponding periods from 2015 to 2019. The results showed that LD reduced the frequency and intensity of high-pollution events. Reductions were more relevant during POST-LD than during LD period for all variables, except quasi-ultrafine particles and PM10-2.5. Two events of long-range transport of dust were observed, which need to be identified and removed to determine the effect of LD. The decreases in the quasi-ultrafine particles and eBC concentrations were 20%, and 15-22%, respectively. PM2.5 concentration was reduced by 13-44% whereas PM10-2.5 concentration was unaffected. The concentration of accumulation mode particles followed the behaviour of PM2.5, with reductions of 19-57%. The results obtained could be relevant for future strategies aimed at improving air quality and understanding the processes that influence the number and mass particle size distributions.
... Much has been written about the use of mobility data for surveillance and policymaking during the COVID-19 pandemic, showing its utility for evaluating non-pharmaceutical interventions and for short-term forecasting of cases, hospitalizations, and deaths. 1,[3][4][5][6]9,10,12 However, our analysis examined the opposite side of the coin: how different jurisdictions reacted (in terms of mobility) to rising case rates. In highly responsive jurisdictions, rising case rates were met by substantial reductions in mobility; in less responsive jurisdictions, only small reductions in mobility were observed for the same levels of incidence. ...
Preprint
Full-text available
Background: Mobile phone-derived human mobility data are a proxy for disease transmission risk and have proven useful during the COVID-19 pandemic for forecasting cases and evaluating interventions. We propose a novel metric using mobility data to characterize responsiveness to rising case rates. Methods: We examined weekly reported COVID-19 incidence and retail and recreation mobility from Google Community Mobility Reports for 50 U.S. states and nine Canadian provinces from December 2020 to November 2021. For each jurisdiction, we calculated the responsiveness of mobility to COVID-19 incidence when cases were rising. Responsiveness across countries was summarized using subgroup meta-analysis. We also calculated the correlation between the responsiveness metric and the reported COVID-19 death rate during the study period. Findings: Responsiveness in Canadian provinces (β = -1.45; 95% CI: -2.45, -0.44) was approximately five times greater than in U.S. states (β = -0.30; 95% CI: -0.38, -0.21). Greater responsiveness was moderately correlated with a lower reported COVID-19 death rate during the study period (Spearman's ρ = 0.51), whereas average mobility was only weakly correlated the COVID-19 death rate (Spearman's ρ = 0.20). Interpretation: Our study used a novel mobility-derived metric to reveal a near-universal phenomenon of reductions in mobility subsequent to rising COVID-19 incidence across 59 states and provinces of the U.S. and Canada, while also highlighting the different public health approaches taken by the two countries. Funding: This study received no funding.
... During the COVID-19 pandemic, the use of mobile network data of user activities has seen several applications, such as to inform reopening strategies 15,16 , for informing evidence-based policy making by authorities in attempt to manage the spread of SARS-CoV-2 [17][18][19] , early detection of COVID-19 outbreaks 20,21 , and for informing COVID-19 forecast models 22 . ...
Article
Full-text available
Reliable forecast of COVID-19 hospital admissions in near-term horizons can help enable effective resource management which is vital in reducing pressure from healthcare services. The use of mobile network data has come to attention in response to COVID-19 pandemic leveraged on their ability in capturing people social behavior. Crucially, we show that there are latent features in irreversibly anonymized and aggregated mobile network data that carry useful information in relation to the spread of SARS-CoV-2 virus. We describe development of the forecast models using such features for prediction of COVID-19 hospital admissions in near-term horizons (21 days). In a case study, we verified the approach for two hospitals in Sweden, Sahlgrenska University Hospital and Södra Älvsborgs Hospital, working closely with the experts engaged in the hospital resource planning. Importantly, the results of the forecast models were used in year 2021 by logisticians at the hospitals as one of the main inputs for their decisions regarding resource management.
... With these efforts, nonpharmaceutical interventions (such as national lockdowns) have been evaluated for their effectiveness and socio-economic impact on different groups [9][10][11] , models have been developed to predict disease spatial diffusion 12,13 , and scenarios have been modeled to assess their outcomes [14][15][16][17] . Studies have demonstrated that mobility data are a meaningful proxy measure of social distancing 18 , affect viral spreading 19,20 , and are useful for predicting the spread of COVID-19 [21][22][23] . ...
... Their study demonstrated that public mobility data can be used to develop reduced-form and simple models that mimic the behavior of more sophisticated epidemiological models for predicting COVID-19 cases on a 10-day basis 21 . Another study examined several state-of-the-art machine learning models and statistical methods and demonstrated how mobility data can improve prediction trends when used as exogenous information in models 22 . ...
... For Model 1, we observe a moderate and negative correlation between the prediction error rate and income (R (1-day) = -0. 22 that Model 2 -an ARIMAX with mobility data added as exogenous variable -performs better (i.e., has lower errors) in counties that share higher income, higher smartphone ownership, larger populations, and higher educational levels (see Fig. 4 for weekly representations of the weekly correlations for some of these features). On the other hand, the correlation analysis also reveals a moderate and positive relationship between the error rate and the NCHS code (R (1-day) = + 0.35, R (7-days) = + 0.39, p-value < 0.01), median age (R (1-day) = + 0.29, R (7-days) = + 0.28, p-value < 0.01), black percentage (R (1-day) = + 0.27, R (7-days) = + 0.27, p-value < 0.01). ...
Preprint
Full-text available
In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models’ performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, urban, and non-black-dominated counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, non-white, and less educated regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these regions. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.
... Their study demonstrated that public mobility data can be used to develop reduced-form and simple models that mimic the behavior of more sophisticated epidemiological models for predicting COVID-19 cases on a 10-day basis 21 . Another study examined several state-of-the-art machine learning models and statistical methods and demonstrated how mobility data can improve prediction trends when used as exogenous information in models 22 . ...
Preprint
Full-text available
In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, urban, and non-black-dominated counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, non-white, and less educated regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these regions. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.