Content uploaded by Sui Zhang
Author content
All content in this area was uploaded by Sui Zhang on Oct 30, 2022
Content may be subject to copyright.
Do Spatiotemporal Units Matter for Exploring the Microgeographies of
Epidemics?
Sui Zhang, Minghao Wang, Zhao Yang, Baolei Zhang*1
College of Geography and Environment, Shandong Normal University, Jinan 250014, China
Abstract: From the onset of the COVID-19 pandemic in 2020, studies on the
microgeographies of epidemics have surged. However, studies have neglected the
significant impact of multiple spatiotemporal units, such as report timestamps and
spatial scales. This study examines three cities with localized COVID-19 resurgence
after the first wave of the pandemic in mainland China to estimate the differential
impact of spatiotemporal unit on exploring the influencing factors of epidemic spread
at the microscale. The quantitative analysis results suggest that future epidemiology
research should give greater attention to the “symptom onset” timestamp instead of only
the “confirmed” data and that “spatial transmission” should not be confused with
“spatial sprawling” of epidemics, which can greatly reduce comparability between
epidemiology studies. This research also highlights the importance of considering the
modifiable areal unit problem (MAUP) and the uncertain geographic context problem
(UGCoP) in future studies.
Keywords: Spatiotemporal units; Modifiable areal unit problem; Zero-inflated model;
Spatial epidemiology; Microgeography; COVID-19
1 Introduction
The novel coronavirus disease 2019 (COVID-19), which was first discovered in Wuhan,
China, in December 2019, has swept the world with high infectivity and mortality (Wu
et al., 2020a). In the past two years, the global public health crisis has penetrated almost
all fields of academia, and more than 250,000 related papers have been published
according to the web of science (WOS) database (www.webofscience.com). For
geographers, COVID-19's innate dual attributes of space and time have given them a
vast role to play (Rosenkrantz et al., 2021). Based on the temporal and spatial attributes
of the pandemic, geographers have made fruitful achievements in establishing
statistical analysis systems and developing abundant quantitative analytic methods for
spatiotemporal epidemiology. Both efforts have made great contributions not only to
the prevention and control of the pandemic but also to the smooth operation and
sustainable development of society in the post-pandemic era (Lai et al., 2015, Lai et al.,
2017, Kraemer et al., 2020, Guan et al., 2020a).
Because of the essential role geography plays in contemporary epidemiological
research, it is undoubtedly of great significance to standardize the related research in
processes and interpretations of results, build a more communicable research network,
enhance the comparative value among studies and make the obtained results easier to
* Corresponding author.
E-mail addresses: zhangsui921@outlook.com (S. Zhang), wangmhsd@163.com (M. Wang),
sdnuyangz@gmail.com (Z. Yang), blzhangsd01@sdnu.edu.cn (B. Zhang).
systematically summarize (Franch-Pardo et al., 2020). The selection of spatiotemporal
units has always been an unavoidable basic step in the microgeographic research of
epidemics. First, the data about infected cases usually have multiple attributes in the
time dimension, and the three attributes of “confirmed (CT)”, “quarantined/hospitalized
(QT)” and “symptom onset (ST)” are the most frequently discussed (Chen et al., 2020,
Chan et al., 2020, Huang et al., 2020). Since there is plenty of evidence to prove that
there is a certain incubation period between viral infection and symptom onset in most
epidemics and the efficiency and priority of nonpharmaceutical interventions vary
spatiotemporally, these three attributes of cases often show considerable differentiation
(Pantaleo et al., 1993, Chun et al., 1997, Chen et al., 2013, He et al., 2020, Zhai et al.,
2021, Gao et al., 2020, Bertuzzo et al., 2020, Bertozzi et al., 2020). Second, when
locating the infection cases spatially, the scales of the spatial units become problems
that cannot be ignored. Generally, microgeographers are accustomed to analysing
geographic information of cases with three scales: individual (IS), cluster (CS) and area
(AS). When a case is reported, the department concerned usually locates it at home or
an activity place, mostly a community or a village as its smallest spatial scale, which is
represented by points on a map. When researchers try to analyse the spatial
characteristics of infected cases at the individual scale, these communities/villages are
regarded as a single point with multiple overlapping points (individuals) (Tanser et
al., 2009, Samphutthanon et al., 2014, Wu et al., 2020b); there is also a part of the
research that analyses the specific location (coordinates) for the places where cases
cluster (the communities/villages mentioned above), and ignores the heterogeneity of
cases within each cluster (Liu et al., 2020b). To make a more official portrait of regional
epidemic characteristics and assess transmission risks, many relevant studies and
government documents take neighbourhood-level administrative polygon as the spatial
scale for case statistics (areas), which makes the spatial pattern of cases less detailed
than the first two spatial scales (Gatrell and Bailey, 1996, Briz-Redon and Serrano-
Aroca, 2020, Yang et al., 2020, Lee et al., 2021). However, although researchers will
comprehensively consider their objectives when choosing the spatiotemporal unit, the
differentiation mentioned above and the following consequences are rarely
quantitatively evaluated and discussed in depth, which makes it difficult for researchers
to systematically and comprehensively compare and review microgeographic
epidemiological studies with different spatiotemporal units, thus greatly hindering the
systematization and theorization of spatiotemporal epidemiological research, for
example, the ongoing pandemic COVID-19 (Pearce et al., 2020).
In this study, we selected three cities with local COVID-19 resurgence after the first
wave of the pandemic in mainland China. The abovementioned spatiotemporal units
were selected to analyse the microgeographic characteristics of the epidemic and the
impact on the assessment of factors that influenced epidemic transmission. The main
aims of research included the following: (1) to summarize the pattern of localized
epidemic when selecting different spatiotemporal units, (2) to estimate the influence of
place-based factors, “source-case” distance and spatial spillover effect on the
development of localized epidemic and its temporal dynamics, (3) systematically
compare the differentiation of analysis results when selecting different spatio-temporal
units and discuss the application norms and scenarios.
2 Materials and methodology
2.1 Empirical cases
Three cases of localized epidemic resurgence in the year after the end of the first
nationwide wave of the pandemic begun in Wuhan in January 2020 were selected in
this study. These three local outbreaks have a common characteristic, that is, they have
a single place-based epidemic source, and all cases are directly or indirectly associated
with the local source (Pang et al., 2020, Wang et al., 2021, Liu et al., 2020a, Zhang et
al., 2020): the Beijing outbreak originated from “Xinfadi Market (XFD)” in June to
July 2020, the Dalian outbreak originated from “Kaiyang Seafood Company (KSF)” in
July to August 2020 and the Tonghua outbreak originated from “Yuansheng Quality
Living Square (YQL)” in January to February 2021. Since the Health Committee of
TongHua City stopped publishing the microgeographic data of the follow-up
asymptomatic to confirmed cases from January 30th, the research limited by January
29th. To further explore the spread of COVID-19, which is mainly decided by
population mobility, we took the functional urban area (FUA) delineation results using
massive Didi ride-hailing records as the local study areas (Ma and Long, 2020).
Considering the inequality of administrative areas and the complexity of quadrilateral
grids in dealing with the problem of grid adjacency, we divided the FUAs into multiple
hexagonal grids with a length of 500 m referring to the idea of community grid epidemic
control and prevention (Ling and Wen, 2020, Li and Gao, 2020, Birch et al., 2007).
2.2 Data source and variable selection
The data used in this paper included case data of COVID-19 and point of interest (POI)
data. The COVID-19 case data used in this study were obtained from the Municipal
Health Commission or Center for Disease Control and Prevention of the cities involved
in this research (Beijing Municipal Health Commission: wjw.beijing.gov.cn, Beijing
Center for Disease Control and Prevention: www.bjcdc.org; Health Commission of
Dalian: hcod.dl.gov.cn; Health Committee of TongHua City:
www.tonghua.gov.cn/wjw). The POI data were obtained from Amap (www.amap.com)
and rectified deviations manually. Catering places, residential areas, shopping places,
public service facilities, and health-care facilities were selected as the gathering place
factors, and all these variables had variance inflation factor (VIF) values of less than
7.5 with different spatial scales, which can be accepted by models with strong
interpretability.
2.3 Methodology
2.3.1 Temporal characteristics recognition
The exponential modified Gaussian function (EMG), which is usually used in the peak
fitting calculation of chromatographic peaks, was used to identify the temporal trends
and the turning point of the local COVID-19 outbreak (Grushka, 1972). The days before
and after the turning point of the EMG were divided into the spread duration (SD) and
the decay duration (DD). Given that the value corresponding to the turning point of the
fitting curve is usually not an integer and that case characteristics near the turning point
are similar, an extra day of cases was taken for both durations in the analysis.
2.3.2 Spatial pattern analysis
Standard Deviational Ellipse (SDE) describes the geographical distribution trend by
summarizing the dispersion and directivity of observation samples (Lefever, 1926, Yuill,
1971). It was used to make a basic judgement on the direction of the spread and the
characteristics of the epidemic. Ripley's K function, which is also a suitable local
second-order point pattern analysis quantitative approach, was used to evaluate the
distribution characteristics of features (clustered, random, or dispersed) on multiple
spatial scales (Getis, 1984, Boots and Getls, 1988, Wiegand and Moloney, 2004). After
the L(d) transformation, the formula is as follows:
,
1 1,
() ( - 1)
nn
ij
i j ji
Ak
Ld nn
π
= = ≠
=∑∑
(1)
where A is the total area of the analysis region; d is the search radius; n is the total
number of features; and ki,j are binary piecewise functions added to set the distance
threshold. By analysing DiffK, the differentiation between the L(d) computated and the
corresponding d, the spatial autocorrelation pattern of feature distribution can be
quantitatively measured. The above spatial analysis and visualization were executed
with GeoDa and ArcGIS 10.7 software.
2.3.3 Spatial zero-inflated negative binomial models
The processing of count data is usually involved in the study of population flow-
oriented events such as epidemics. Previous researchers prefer Poisson regression or
negative binomial regression models to replace traditional ordinary least squares (OLS)
regression (Flowerdew and Aitkin, 1982, Flowerdew and Boyle, 1995). However,
especially in the case of this clustering epidemic, the probability of zero value in the
model is often underestimated, which is beyond the predictive ability of general
counting models such as Poisson and negative binomial regression. Because there are
too many zeros in the counting data and the same zeros represent different situations,
the counting data often show great variation. The estimation process of the zero-inflated
model is composed of a logit regression and a Poisson regression or negative binomial
regression, so the regression is divided into two independent processes to solve the
problem of zero value from different sources (Lambert, 1992, Burger et al., 2009).
Taking the zero-inflated negative binomial model with more widespread utilization as
an example, the functions are as follows:
( )
1
1
1
Pr( 0) 1
ij ij ij
ij
n
α
α
ψψ
αλ
−
−
−
= = +−
+
(2.1)
( )
1
11
11 1
()
Pr( ) 1 !( )
k
ij
ij ij
ij ij
k
nk k
α
λ
αα
ψα αλ αλ
−
−−
−− −
Γ+
= = −
Γ+ +
(2.2)
The conditional mean λij is related to the exponential function of the regression
explanatory variable:
01 2 3
exp( ln ln ln )
ij i j ij
PPd
λ ββ β β
=++ −
(3)
By incorporating the spatial lag term of the variables into the negative binomial
regression part of the model, formula (5) is transformed into the following formula:
01 2 3
exp( ln ln ln ln )
ij i j ij ij ij
P PW d
λ ββ β δ λβ
=++ + −
(4)
When there is no excess zero in the dependent variable, the model will run under the
classical negative binomial regression framework to ensure the efficiency of the model
in estimating the effects of independent variables. Considering the frequent occurrence
of zero values in the gathering place variables, natural logarithm operations cannot be
performed. To ensure the successful calculation of the model and the reliability of the
results, the independent variables are transformed by adding a fixed amount of 10-4 in
this research (Table 1).
3 Why do spatiotemporal units matter?
Based on the time series analysis of the number of new cases per day in COVID-19,
the temporal characteristics and segmentation of the epidemic were visualized (Fig. 1).
The morphological characteristics of the fitting curves showed the trends of epidemic
development with different report timestamps. Compared with the Gaussian fitting
curve, which was almost similar to the characteristic of the curve of “symptom onset”,
the fitting curves for “confirmed” and “quarantined” showed a strong right-sided state.
This phenomenon indicated that the incidence of the COVID-19 epidemic followed a
normal distribution, and its temporal characteristics were difficult to artificially change
at the present stage. Through quantitative comparison of the occurrence dates of the
turning points, the turning points of “confirmed” and “quarantined” were
approximately 2-3 days ahead of the arrival of the turning point of “symptom onset”,
respectively, further confirming the importance of strong investigation, quarantine and
lockdown measures for controlling the transmission speed of local epidemics in a
timely manner.
To understand the importance of spatial scale selection in epidemiological research, the
standardized spatial patterns of localized outbreaks with three spatial scales were
demonstrated (Fig. 2). The standardized spatial pattern reports the strong spatial
heterogeneity of epidemic statistics results when adopting these three spatial scales. It
is worth noting that this spatial differentiation is smaller in areas far from the epidemic
source, and larger in areas close to the epidemic source, which are also the high
incidence areas. If the three kinds of spatial scales were confused, there would be a
serious deviation in describing the spatiotemporal characteristics of the epidemic; in
particular, this deviation would greatly interfere with the analysis of high-incidence
areas, which could present great obstacles to the prevention and control of local
epidemics.
Fig. 1 Temporal characteristics of the daily new COVID-19 cases with different report timestamps:
(a)Beijing (XFD), (b)Dalian (KSF), (c)Tonghua (YQL)
Fig. 2 Spatial differentiation (standardized) of the cases aggregation with different spatial scales:
(a)Beijing (XFD), (b)Dalian (KSF), (c)Tonghua (YQL)
4 Where/when do spatiotemporal units matter?
4.1 Analysis of points’ distribution pattern
The standard deviation ellipse analysis was carried out on the point of the
communities/villages and infected cases in two durations (Fig. 3). From the directional
analysis of epidemic transmission, the two durations of the epidemic changed to a
certain extent under the three report timestamps, and the directional change of point
was relatively weak in the decay duration, which was least obvious in the “symptom
onset”. From the clustered characteristics of the epidemic, the infected cases and the
communities/villages with the “confirmed” and “quarantined” timestamps in the decay
duration were more dispersed. When using the cluster for analysis, the scope and
direction of the epidemic in the spread duration are greatly overestimated compared
with the analysis of infection cases in the same duration, which might be closely related
to the strong tension in the local source of the epidemic.
Table 1 Descriptive statistics of variables among three outbreaks (cities)
Scale Variable Definition Abbreviation
(a)Beijing(XFD)
(b)Dalian(KSF)
(c)Tonghua(YQL)
Obs Sum Mean Std. Dev. Min Max
Obs Sum Mean Std. Dev. Min Max
Obs Sum Mean Std. Dev. Min Max
Hexagonal grids
Dependent variable
Number of cases Total number of COVID-19 cases per grid Individual 16,807
337 0.02 0.67 0 64
2,617
81 0.03 0.54 0 18
470
218 0.46 2.19 0 25
Number of clusters Total number of COVID-19 clusters per grid Cluster 16,807
129 0.01 0.11 0 4
2,617
37 0.01 0.18 0 5
470
116 0.25 1.08 0 9
Independent variables
Catering places Total number of catering places per grid CP 16,807
8,534 0.51 2.60 0 90
2,617
2,129 0.81 4.35 0 79
470
283 0.60 3.31 0 36
Residential areas Total number of residential areas per grid RA 16,807
28,485
1.70 5.23 0 92
2,617
3,999 1.53 4.37 0 50
470
355 0.76 2.84 0 24
Shopping places Total number of shopping places per grid SP 16,807
32,818
1.95 10.67 0 369
2,617
10,952
4.18 22.44 0 531
470
1,551
3.30 19.55 0 274
Public service facilities Total number of public service facilities per grid PSF 16,807
26,599
1.58 7.33 0 200
2,617
6,184 2.36 9.19 0 191
470
586 1.25 5.63 0 64
Health-care facilities Total number of health-care facilities per grid HCF 16,807
11,046
0.66 2.87 0 86
2,617
2,987 1.14 5.13 0 123
470
367 0.78 3.95 0 53
"Source-case" distance Average value of the distance (m) to the local epidemic source per grid SCD 16,807
- 44,926
20,407 395 95,673
2,617
- 23,986
12,544 331 59,707
470
- 7,776 3,609 368 16,649
Neighbourhood-level polygon
Dependent variable Number of cases Total number of COVID-19 cases per polygon Area 324 337 1.04 9.34 0 164
66 81 1.23 7.76 0 63
13 218 16.77 18.38 0 54
Independent variables
Catering places Total number of catering places per polygon CP 324 8,534 26.34 32.25 0 174
66 2,129 30.92 34.91 0 173
13 283 21.77 21.21 0 64
Residential areas Total number of residential areas per polygon RA 324 28,485
87.98 83.41 2 503
66 3,999 59.18 79.95 1 457
13 355 27.31 16.94 1 59
Shopping places Total number of shopping places per polygon SP 324 32,818
101.31
126.38 0 963
66 10,952
163.44
185.72 0 879
13 1,551
119.31
118.10 0 335
Public service facilities Total number of public service facilities per polygon PSF 324 26,599
82.11 114.89 1 1,006
66 6,184 92.41 85.14 0 500
13 586 45.08 32.98 1 100
Health-care facilities Total number of health-care facilities per polygon HCF 324 11,046
34.12 47.70 0 509
66 2,987 44.58 32.91 0 150
13 367 28.23 20.35 0 55
"Source-case" distance Average value of the distance (m) to the local epidemic source per polygon SCD 324 - 32,808
20,679 3,504
88,166
66 - 20,825
12,533 3,916
54,720
13 - 4,923 3,300 1,486
11,049
Fig. 3 Standard deviational ellipses with different spatiotemporal units in the two durations
Based on the results of different types of spatiotemporal units, Ripley's K function was
used to calculate the multiscale distribution characteristics of the local epidemic, and
the variation trends of the calculated DiffK with different spatiotemporal units were
visualized in the heatmap (Fig. 4). Regardless of which report timestamps or spatial
scales were used, the clustered range and intensity of epidemic distribution were higher
in the decay duration than in the duration before it. Although the clustered range was
relatively small, the clustered intensity was relatively high when the “areas” were
counted. However, compared with the cluster layout, which also showed a significant
clustered distribution pattern in different report timestamps and durations, when
infected cases were selected as spatial scales, the degree of clustering was no longer
significant. It is worth noting that in the “symptom onset” statistics, although the
number of cases in the spread duration and the decay duration was relatively close,
there were still large differences in the clustered range and the clustered intensity, which
could confirm the analysis of the above descriptive statistical results.
Fig. 4 Trends of DiffK with different spatiotemporal units in the two durations
4.2 Modelling result of the factors
4.2.1 Cross-sectional model recognition and the estimation result
Based on the significant results of the spatial autocorrelation of the model-dependent
variables (Z>>2.58) and the (robust) Lagrange multiplier test, spatial lag zero-inflated
negative binomial regression (SL-ZINB) was further constructed based on zero-inflated
negative binomial regression (ZINB), and the Akaike Information Criterion (AIC)
value of the model indicates that this improvement could further optimize the most parts
of model goodness (LeSage and Pace, 2008, Middela and Ramadurai, 2020). Therefore,
in this paper, SL-ZINB (SL-NB in Dalian and Tonghua with area scales due to the lack
of excess zero) was selected to estimate and discuss the distance decay effect, the effect
of spatial interaction, and the gathering place factor during the COVID-19 epidemic.
Table 2 reports the results of estimating the overall impact factors of the epidemic by
using SL-ZINB models. Regardless of which spatial scale was chosen, the source-case
distance was always one of the most important and significant factor affecting the local
clustering of COVID-19. At the same time, the spatial spillover effect of the dependent
variables also significantly promoted the development of the epidemic to a considerable
extent. However, compared with the model taking the cluster as the spatial scale, the
factors of multiple gathering places had a more significant impact on the estimated
results of the model taking the inflected cases with neighbourhood-level polygon (area).
Catering places and public service facilities played roles in promoting the development
of the epidemic with 99% confidence, and this significant effect was reflected only in
the “source-case” distance in the model taking the cluster as the spatial scale. Therefore,
the “source-case” distance was almost the only factor that significantly affected the
transmission of COVID-19 in the model estimation of the epidemic. Therefore, we
could obtain from the overall model estimation results that the expansion of the
epidemic in the spatial dimension was mainly related to the “source-case” distance and
the gathering of public service places, while the spread of the epidemic was pulled by
more kinds of gathering places (Zhang et al., 2021). There was two interesting point
worthy of attention: the first one is when the number of infected cases was taken as the
spatial scale, the catering industry had a significant positive effect on the development
of the epidemic within two cities, which is possible that individual-level data can better
capture the influence of activities with smaller scope and shorter duration, such as
dining, on the spread of the epidemic; secondly, the decisive role of "source-case"
distance in the development of epidemic does not seem to exist in Tonghua, which may
be caused by the small scale of Tonghua city and the limited potential of distance effect.
4.2.2 Longitudinal variation of factors’ coefficient
In COVID-19, the distance decay effects showed a trend of increasing at first and then
decreasing or stabilizing fluctuations (Fig. 5-7). There was no obvious morphological
difference in the time series variation curves of the distance decay coefficient under
different spatial scales, but the time-series variation curves of the distance decay
coefficient obtained by using three report timestamps had strong heterogeneity. It is
worth mentioning that, due to strong manual intervention, the sharp increase of the
distance decay coefficient of “confirmed” and “quarantined” during spread duration
could be foreseen, but the nonhuman intervention phenomenon of “symptom onset”,
which conformed to the standard Gaussian distribution, also showed the phenomenon
that the coefficient rises during spread duration, which showed that compared with the
rapid extension and diffusion in common sense, adduction and traceability were the
main themes of the spread duration of the epidemic instead.
Fig. 5 Temporal variation of the coefficient of factors in the Beijing (XFD) outbreak
The spatial lag variable of the epidemic played a significant role in promoting the
spread of the epidemic. The estimation results of the impact factors of the epidemic
under three kinds of report timestamps were similar to a certain extent: regardless of
the type of report timestamp, the positive effect of the spatial lag term on the
transmission of the epidemic during the decay duration showed an upwards trend.
However, in the spread duration of the epidemic, this homogeneity was no longer
obvious, and it was replaced by obvious heterogeneity: the estimation results of
“confirmed” and “quarantined” in the spread duration of the epidemic were greatly
influenced by the selection of the spatial scales. When the infected cases were taken as
the counting object, the estimated results showed a decreasing trend, and this
insignificant but exact trend would cease to exist when counting clusters.
For the variables of the places where people gather, the temporal variation curve of four
kinds of gathering place variables would not change obviously when different spatial
scales were selected (Fig. 8). However, in the coefficient curve of health-care facilities,
this heterogeneity became obvious: when three different report timestamps were
selected, the correlation coefficients among different spatial scales had no obvious
positive significance, and there rarely was significant negative correlation in the
“quarantined” timestamps. Especially during the spread duration of the epidemic, the
coefficients and the trends of “confirmed” and “quarantined” differed greatly in the
direction of the effect, which was relatively well done by the “symptom onset” data set.
Combining the temporal variation of five gathering place coefficients and their
correlation coefficients under different spatial scales, the probability and magnitude of
this heterogeneity was also the smallest when “symptom onset” was used as a report
timestamp. At the same time, the timestamps of “symptom onset” also shows different
characteristics from the timestamps of “confirmed” and “quarantined” temporal
difference of coefficient estimation of each variable (Fig. 9). At the initial stage of
epidemic, the difference between its coefficient and the “confirmed” coefficient is more
prominent than the “quarantined” timestamp, which makes it necessary to explore the
“symptom onset” timestamp when researching the characteristics and influencing
factors of epidemic at the initial phase. It was undoubtedly very important for the
prevention and control of COVID-19 to have an accurate grasp of the impact factors,
especially in the initial stage of epidemic spread. Because of these results, researchers
and managers should give more attention to the selection of spatial scales when
modelling and analysing at the microscale.
Fig. 6 Temporal variation of the coefficient of factors in the Dalian (KSF) outbreak
Table 2 Cross-sectional coefficients estimation with different spatial scales
Variable (a)Beijing(XFD)
(b)Dalian(KSF)
(c)Tonghua(YQL)
Individual Cluster Area
Individual Cluster Area
Individual Cluster Area
ln(CP) -0.0318 -0.0112 -0.2868
0.2027** 0.1662* 0.0760
0.0909** 0.0575* 2.5362***
ln(RA) -0.0441 -0.0210 -0.0063
-0.0908 -0.0573 0.2577
-0.0550 0.0145 0.7966
ln(SP) -0.0104 0.0598* -0.7272**
0.2340 0.1859 0.7380
-0.0589 -0.0208 -1.1443*
ln(PSF) 0.0506 0.0057 1.5036***
-0.2476 -0.0479 0.3349
0.0765* 0.0918* -0.3770
ln(HCF) 0.0482 0.0372 -0.1491
-0.2266** -0.1921* -1.5365
-0.0068 -0.0020 -1.0519
ln(SCD) -1.2669*** -1.0174*** -2.0279***
-1.6297** -1.1120* -4.7443***
-0.2394 -0.2684 -0.6427
Wij*ln(Individual) 0.0641**
0.0253
0.4026**
Wij*ln(Cluster)
0.0404
0.0133
0.3018**
Wij*ln(Area)
0.1902***
-0.1366*
-0.0640
Constant 10.5714*** 7.3666*** 17.9401***
13.5495*** 8.2128* 41.9026***
3.1959 2.8223** 7.9016
Log likelihood -517.9831 -411.9 -172.3
-104.9 -87.2 -36.3
-109.8 -81.2 -24.5
AICNB 1082.32 866.62 396.54
240.18 201.40 91.55
270.97 194.42 65.67
AICZINB 1069.78 854.07 388.28
239.93 204.40 104.98
262.00 199.67 73.22
AICSL-NB 1078.85 866.22 380.69
241.89 199.44 90.62
257.84 188.79 66.92
AICSL-ZINB 1067.97 855.85 376.65
241.78 206.34 106.98
251.62 194.34 76.35
* p<0.1 ** p<0.05 *** p<0.01
Fig. 7 Temporal variation of the coefficient of factors in the Tonghua (YQL) outbreak
5 Which spatiotemporal units matter?
5.1 Report timestamps
First, using different time attributes of case data in analyses could obtain different
results, among which “confirmed” and “quarantine” timestamps had similar
characteristics in the various spatiotemporal analyses due to their strong manual
intervention. However, the “symptom onset” timestamp, which could better reflect the
time and process of infection, was often different from the analysis results of the other
two report timestamps (Chen et al., 2020, Huang et al., 2020). Because the epidemic
under the “symptom onset” timestamp was more normalized, the heterogeneity of
characteristics before and after the turning point of the epidemic was not as significant
as with the other two timestamp data. Undoubtedly, when analysing or modelling the
entire epidemic, choosing the “symptom onset” timestamp data made the analysis
results more stable and robust. This phenomenon occurred in the standard deviational
ellipse analysis and the temporal variation of modelling results. However, conventional
research was mostly based on the “confirmed” timestamp of case data (Han et al., 2021,
Guan et al., 2020b, Liu et al., 2020b). A large part of this is due to different research
purposes and the difficulties of the data acquisition of the “symptom onset” data and
the uncertainty. Therefore, with the deepening of epidemic-related research, the
requirements for data mining are increasing (Zhou et al., 2020). Researchers should
consider that the “confirmed” timestamp data are deceptive to some extent, and similar
data after manual forced intervention cannot accurately reflect the actual transmission
of SARS-CoV-2. It was easy to cause cognitive bias or even make an opposite
judgement about the epidemic. At this point, the data with the “symptom onset”
timestamp were acting better. Therefore, we suggest using more “symptom onset” data
in future epidemiological studies, which not only requires academic circles to reach a
more consistent consensus but also requires the government to manage the
epidemiological information of cases more scientifically and transparently (Sun et al.,
2020, Xia et al., 2020).
Fig. 8 Pearson’s r between the temporal trends of coefficients with different spatiotemporal units:
(a)Beijing (XFD), (b)Dalian (KSF), (c)Tonghua (YQL)
5.2 Spatial scales
When selecting spatial scales of cases, through point pattern analysis, there were
completely different analysis results on whether the epidemic was clustered and the
clustered range of the epidemic. The clustered characteristic of the epidemic was high,
and the clustered range was large. Further, clustered cases were often not significant,
and the clustered range was relatively small. In the econometric modelling and the
temporal variation of the estimation coefficient, the research results obtained by using
the epidemic data of different spatial scales could also not explain this problem very
well, and it was easy to conclude that the cluster was not significantly affected by the
factors of gathering places. At the same time, the infected cases were significantly
affected by many factors of gathering places. The effect of place on the transmission of
COVID-19 often appeared opposite on the spatial scales. The result of this difference
was not unexpected. The spatial scale with points as the epidemic research unit was
chosen for its high accuracy, and it could be approximately regarded as reaching the
individual level. Thus, the dependence and sensitivity of its research results on the
spatial scale could be foreseen (Kwan, 2018). Even so, most of the previous related
studies ignored the bias that might be introduced to the research results during the data
preparation process and called the research objects “COVID-19 epidemic” in general.
The results calculated with these two spatial scales represent two dimensions that
should be considered in spatial epidemiology as “spatial sprawling” and “spatial
transmission”. The results of this study proved that the two were different and could
not be confused. This differentiation in the selection of spatial scales reduced the
reference significance of these studies in the subsequent reference process and greatly
lost the regional comparative value of these studies (Rex et al., 2020, Wu et al., 2020b,
Liu et al., 2020b).
Finally, the impact of the size and shape of epidemic spatial boundaries on the results
of spatiotemporal epidemiological studies was also reported in this paper. In the past,
related studies were mostly macro- or large-scale studies, their research units were
usually administrative regions or large grid units, and the selection principle of data
spatial boundaries caused less disturbance in their research results (Xiong et al., 2020,
Michelozzi et al., 2020, Arab-Mazar et al., 2020, Giuliani et al., 2020). In contrast, the
above results can indirectly reflect the differences brought about by the selection
principles of different spatial boundaries. The spatial modelling results of econometric
models provide sufficient evidence for the significant spatial spillover effect of the
COVID-19 epidemic at the microlevel. This significant spillover effect will
undoubtedly greatly affect the estimation results when the size or shape of the research
units changes (LeSage and Pace, 2008). Therefore, we suggest that the smallest areal
unit of data or grids with a small area should be used as often as possible, as they are
not limited to arbitrary spatiotemporal units when individual-level space-time data are
not available. It is necessary to discuss the modifiable areal unit problem (MAUP) and
the uncertain geographic context problem (UGCoP) in future microscale
spatiotemporal epidemiological studies (Dungan et al., 2002, Kwan, 2012, Kwan, 2018).
Based on this, perhaps academic circles should pay more attention to the role of remote
sensing and raster data in epidemiological research because of their high-resolution
characteristics.
Fig. 9 Temporal characteristics of standardized differentiation between coefficients with different report
timestamps: (a)Beijing (XFD), (b)Dalian (KSF), (c)Tonghua (YQL)
5.3 Limitations and uncertainties
The epidemic was modelled based on distance, spatial effect, and places where people
gather; however, the model did not include all the factors that affect the transmission of
epidemics, such as population density, economic activity intensity, and population
mobility. Second, although this study proves that the selection of spatiotemporal units
was very important in the microlevel study of the COVID-19 epidemic, the results of
this study did not provide enough evidence to show whether the clusters or infected
cases are better or worse as the spatial scale of the epidemic research. The discussion
based on the cluster could better reflect the “sprawling” process of the epidemic in the
spatial dimension, while the number of infected cases can better explain the focus and
bias of the epidemic, which needs much follow-up for validation. In addition, it should
not be neglected that the open-source case data independently counted by the Municipal
Health Commissions and the Centers for Disease Control and Prevention in this study
were not the most detailed and accurate. Although we approximated the data using more
consistent and scientific principles, the possible errors or bias caused by approximate
processing in such microscopic research is still worth discussing, and the accurate
location of relevant case points needs to be further corrected in the government’s
internal data. Finally, although COVID-19 was undoubtedly a typical space pandemic,
the results of this study can only be regarded as enlightening and alarming work. The
heterogeneity caused by different spatiotemporal units and their advantages and
disadvantages need larger-scale inspection work and additional types of studies.
6 Conclusion
Our research highlights the importance of the selection of spatiotemporal units in
microgeographic epidemiology studies through the multidimensional spatiotemporal
analysis of COVID-19 in three typical localized outbreaks in mainland China. Different
report timestamps and spatial scales had a significant impact on the results of
spatiotemporal analysis and modelling. This heterogeneity was reflected in the
clustering and directionality of epidemic spatial distribution, its factors, and their
temporal variation. According to the results of quantitative studies, due to the better
robustness of the timestamp “symptom onset”, we suggest that more attention should
be given to it instead of focusing only on “confirmed” timestamps because “confirmed”
data are often deceptive and unstable in epidemiological research. Spatial scales were
also worth emphasizing in the microstudy of the epidemic in the individual units. Based
on this, the specific content of the follow-up epidemiological study should be clarified
more clearly; that is, the spatial transmission should not be confused with the spatial
“sprawling” of epidemics, which will greatly reduce the possibility of systematization
of research results. In addition, the modifiable areal unit problem (MAUP) and the
uncertain geographic context problem (UGCoP) are also worthy of a more in-depth
discussion in the microscopic study of epidemics.
Reference
Arab-Mazar, Z., Sah, R., Rabaan, A. A., et al. (2020) 'Mapping the incidence of the
COVID-19 hotspot in Iran – Implications for Travellers', Travel Medicine and
Infectious Disease, 34, pp. 101630. DOI:
https://doi.org/10.1016/j.tmaid.2020.101630.
Bertozzi, A. L., Franco, E., Mohler, G., et al. (2020) 'The challenges of modeling and
forecasting the spread of COVID-19', Proceedings of the National Academy of
Sciences of the United States of America, 117(29), pp. 16732-16738. DOI:
10.1073/pnas.2006520117.
Bertuzzo, E., Mari, L., Pasetto, D., et al. (2020) 'The geography of COVID-19 spread
in Italy and implications for the relaxation of confinement measures', Nature
Communications, 11(1). DOI: 10.1038/s41467-020-18050-2.
Birch, C. P. D., Oom, S. P. and Beecham, J. A. (2007) 'Rectangular and hexagonal
grids used for observation, experiment and simulation in ecology', Ecological
Modelling, 206(3-4), pp. 347-359. DOI: 10.1016/j.ecolmodel.2007.03.041.
Boots, B. N. and Getls, A. (1988) Point Pattern Analysis. Morgantown: Regional
Research Institute, West Virginia University.
Briz-Redon, A. and Serrano-Aroca, A. (2020) 'A spatio-temporal analysis for
exploring the effect of temperature on COVID-19 early evolution in Spain',
Science of the Total Environment, 728. DOI: 10.1016/j.scitotenv.2020.138811.
Burger, M., Van Oort, F. and Linders, G.-J. (2009) 'On the Specification of the Gravity
Model of Trade: Zeros, Excess Zeros and Zero-inflated Estimation', Spatial
Economic Analysis, 4(2), pp. 167-190. DOI: 10.1080/17421770902834327.
Chan, J. F.-W., Yuan, S., Kok, K.-H., et al. (2020) 'A familial cluster of pneumonia
associated with the 2019 novel coronavirus indicating person-to-person
transmission: a study of a family cluster', Lancet, 395(10223), pp. 514-523.
DOI: 10.1016/s0140-6736(20)30154-9.
Chen, N., Zhou, M., Dong, X., et al. (2020) 'Epidemiological and clinical
characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan,
China: a descriptive study', Lancet, 395(10223), pp. 507-513. DOI:
10.1016/s0140-6736(20)30211-7.
Chen, Y., Liang, W., Yang, S., et al. (2013) 'Human infections with the emerging avian
influenza A H7N9 virus from wet market poultry: clinical analysis and
characterisation of viral genome', Lancet, 381(9881), pp. 1916-1925. DOI:
10.1016/s0140-6736(13)60903-4.
Chun, T. W., Carruth, L., Finzi, D., et al. (1997) 'Quantification of latent tissue
reservoirs and total body viral load in HIV-1 infection', Nature, 387(6629), pp.
183-8. DOI: 10.1038/387183a0.
Dungan, J. L., Perry, J. N., Dale, M. R. T., et al. (2002) 'A balanced view of scale in
spatial statistical analysis', Ecography, 25(5), pp. 626-640. DOI:
10.1034/j.1600-0587.2002.250510.x.
Flowerdew, R. and Aitkin, M. (1982) 'A method of fitting the gravity model based on
the Poisson distribution', Journal of regional science, 22(2), pp. 191-202.
DOI: 10.1111/j.1467-9787.1982.tb00744.x.
Flowerdew, R. and Boyle, P. J. (1995) 'Migration models incorporating
interdependence of movers', Environment & Planning A, 27(9), pp. 1,493-502.
Franch-Pardo, I., Napoletano, B. M., Rosete-Verges, F., et al. (2020) 'Spatial analysis
and GIS in the study of COVID-19. A review', Science of the Total
Environment, 739. DOI: 10.1016/j.scitotenv.2020.140033.
Gao, S., Rao, J. M., Kang, Y. H., et al. (2020) 'Association of Mobile Phone Location
Data Indications of Travel and Stay-at-Home Mandates With COVID-19
Infection Rates in the US', Jama Network Open, 3(9). DOI:
10.1001/jamanetworkopen.2020.20485.
Gatrell, A. C. and Bailey, T. C. (1996) 'Interactive spatial data analysis in medical
geography', Social science & medicine (1982), 42(6), pp. 843-55. DOI:
10.1016/0277-9536(95)00183-2.
Getis, A. (1984) 'Interaction Modeling Using Second-Order Analysis', Environment
and Planning A: Economy and Space, 16(2), pp. 173-183. DOI:
10.1068/a160173.
Giuliani, D., Dickson, M. M., Espa, G., et al. (2020) 'Modelling and Predicting the
Spatio-Temporal Spread of Coronavirus Disease 2019 (COVID-19) in Italy',
SSRN Electronic Journal. DOI: 10.2139/ssrn.3559569.
Grushka, E. (1972) 'Characterization of exponentially modified Gaussian peaks in
chromatography', Analytical chemistry, 44(11), pp. 1733-8. DOI:
10.1021/ac60319a011.
Guan, D., Wang, D., Hallegatte, S., et al. (2020a) 'Global supply-chain effects of
COVID-19 control measures', Nat Hum Behav. DOI: 10.1038/s41562-020-
0896-8.
Guan, W., Ni, Z., Hu, Y., et al. (2020b) 'Clinical Characteristics of Coronavirus
Disease 2019 in China', New England Journal of Medicine, 382(18), pp. 1708-
1720. DOI: 10.1056/NEJMoa2002032.
Han, Y., Yang, L., Jia, K., et al. (2021) 'Spatial distribution characteristics of the
COVID-19 pandemic in Beijing and its relationship with environmental
factors', Science of the Total Environment, 761. DOI:
10.1016/j.scitotenv.2020.144257.
He, F., Deng, Y. and Li, W. (2020) 'Coronavirus disease 2019: What we know?',
Journal of Medical Virology, 92(7), pp. 719-725. DOI: 10.1002/jmv.25766.
Huang, C., Wang, Y., Li, X., et al. (2020) 'Clinical features of patients infected with
2019 novel coronavirus in Wuhan, China', Lancet, 395(10223), pp. 497-506.
DOI: 10.1016/s0140-6736(20)30183-5.
Kraemer, M. U. G., Yang, C.-H., Gutierrez, B., et al. (2020) 'The effect of human
mobility and control measures on the COVID-19 epidemic in China', Science,
368(6490), pp. 493-+. DOI: 10.1126/science.abb4218.
Kwan, M.-P. (2012) 'The Uncertain Geographic Context Problem', Annals of the
Association of American Geographers, 102(5), pp. 958-968. DOI:
10.1080/00045608.2012.687349.
Kwan, M.-P. (2018) 'The Limits of the Neighborhood Effect: Contextual Uncertainties
in Geographic, Environmental Health, and Social Science Research', Annals of
the American Association of Geographers, 108(6), pp. 1482-1490. DOI:
10.1080/24694452.2018.1453777.
Lai, S., Huang, Z., Zhou, H., et al. (2015) 'The changing epidemiology of dengue in
China, 1990-2014: a descriptive analysis of 25 years of nationwide
surveillance data', Bmc Medicine, 13. DOI: 10.1186/s12916-015-0336-1.
Lai, S., Zhou, H., Xiong, W., et al. (2017) 'Changing Epidemiology of Human
Brucellosis, China, 1955-2014', Emerging Infectious Diseases, 23(2), pp. 184-
194. DOI: 10.3201/eid2302.151710.
Lambert, D. (1992) 'Zero-Inflated Poisson Regression, With an Application to Defects
in Manufacturing', Technometrics, 34(1), pp. 1-14. DOI:
10.1080/00401706.1992.10485228.
Lee, D., Robertson, C. and Marques, D. (2021) 'Quantifying the small-area spatio-
temporal dynamics of the Covid-19 pandemic in Scotland during a period with
limited testing capacity', Spatial statistics, pp. 100508-100508. DOI:
10.1016/j.spasta.2021.100508.
Lefever, D. W. (1926) 'Measuring geographic concentration by means of the standard
deviational ellipse', American journal of sociology, 32(1), pp. 88-94.
LeSage, J. P. and Pace, R. K. (2008) 'SPATIAL ECONOMETRIC MODELING OF
ORIGIN-DESTINATION FLOWS', Journal of Regional Science, 48(5), pp.
941-967. DOI: 10.1111/j.1467-9787.2008.00573.x.
Li, Z. and Gao, G. F. (2020) 'Strengthening public health at the community-level in
China', Lancet Public Health, 5(12), pp. E629-E630.
Ling, C. and Wen, X. (2020) 'Community grid management is an important measure
to contain the spread of novel coronavirus pneumonia (COVID-19)',
Epidemiology and Infection, 148. DOI: 10.1017/s0950268820001739.
Liu, P., Yang, M., Zhao, X., et al. (2020a) 'Cold-chain transportation in the frozen
food industry may have caused a recurrence of COVID-19 cases in
destination: Successful isolation of SARS-CoV-2 virus from the imported
frozen cod package surface', Biosafety and health, 2(4), pp. 199-201. DOI:
10.1016/j.bsheal.2020.11.003.
Liu, S., Qin, Y., Xie, Z., et al. (2020b) 'The Spatio-Temporal Characteristics and
Influencing Factors of Covid-19 Spread in Shenzhen, China-An Analysis
Based on 417 Cases', International Journal of Environmental Research and
Public Health, 17(20). DOI: 10.3390/ijerph17207450.
Ma, S. and Long, Y. (2020) 'Functional urban area delineations of cities on the
Chinese mainland using massive Didi ride-hailing records', Cities, 97. DOI:
10.1016/j.cities.2019.102532.
Michelozzi, P., de'Donato, F., Scortichini, M., et al. (2020) 'Temporal dynamics in
total excess mortality and COVID-19 deaths in Italian cities', BMC Public
Health, 20(1). DOI: 10.1186/s12889-020-09335-8.
Middela, M. S. and Ramadurai, G. (2020) 'Incorporating spatial interactions in zero-
inflated negative binomial models for freight trip generation', Transportation.
DOI: 10.1007/s11116-020-10132-w.
Pang, X., Ren, L., Wu, S., et al. (2020) 'Cold-chain food contamination as the possible
origin of COVID-19 resurgence in Beijing', National Science Review, 7(12),
pp. 1861-1864. DOI: 10.1093/nsr/nwaa264.
Pantaleo, G., Graziosi, C., Demarest, J. F., et al. (1993) 'HIV infection is active and
progressive in lymphoid tissue during the clinically latent stage of disease',
Nature, 362(6418), pp. 355-8. DOI: 10.1038/362355a0.
Pearce, N., Vandenbroucke, J. P., VanderWeele, T. J., et al. (2020) 'Accurate Statistics
on COVID-19 Are Essential for Policy Guidance and Decisions', American
Journal of Public Health, 110(7), pp. 949-951. DOI:
10.2105/ajph.2020.305708.
Rex, F. E., de Souza Borges, C. A. and Kafer, P. S. (2020) 'Spatial analysis of the
COVID-19 distribution pattern in Sao Paulo State, Brazil', Ciencia & Saude
Coletiva, 25(9), pp. 3377-3384. DOI: 10.1590/1413-81232020259.17082020.
Rosenkrantz, L., Schuurman, N., Bell, N., et al. (2021) 'The need for GIScience in
mapping COVID-19', Health & Place, 67. DOI:
10.1016/j.healthplace.2020.102389.
Samphutthanon, R., Tripathi, N. K., Ninsawat, S., et al. (2014) 'Spatio-Temporal
Distribution and Hotspots of Hand, Foot and Mouth Disease (HFMD) in
Northern Thailand', International Journal of Environmental Research and
Public Health, 11(1), pp. 312-336. DOI: 10.3390/ijerph110100312.
Sun, Z., Zhang, H., Yang, Y., et al. (2020) 'Impacts of geographic factors and
population density on the COVID-19 spreading under the lockdown policies
of China', Science of the Total Environment, 746. DOI:
10.1016/j.scitotenv.2020.141347.
Tanser, F., Barnighausen, T., Cooke, G. S., et al. (2009) 'Localized spatial clustering
of HIV infections in a widely disseminated rural South African epidemic',
International Journal of Epidemiology, 38(4), pp. 1008-1016. DOI:
10.1093/ije/dyp148.
Wang, X.-Y., Zhang, Y.-Q. and Cai, L.-W. (2021) 'Spatiotemporal characteristics of
the COVID-19 resurgence in the metropolitan wholesale market of Beijing,
China', Journal of travel medicine, 28(2). DOI: 10.1093/jtm/taab008.
Wiegand, T. and Moloney, K. A. (2004) 'Rings, circles, and null-models for point
pattern analysis in ecology', Oikos, 104(2), pp. 209-229. DOI: 10.1111/j.0030-
1299.2004.12497.x.
Wu, F., Zhao, S., Yu, B., et al. (2020a) 'A new coronavirus associated with human
respiratory disease in China', Nature, 579(7798), pp. 265-269. DOI:
10.1038/s41586-020-2008-3.
Wu, Y., Yan, X., Zhao, S., et al. (2020b) 'Association of time to diagnosis with
socioeconomic position and geographical accessibility to healthcare among
symptomatic COVID-19 patients: A retrospective study in Hong Kong',
Health & Place, 66. DOI: 10.1016/j.healthplace.2020.102465.
Xia, W., Liao, J., Li, C., et al. (2020) 'Transmission of corona virus disease 2019
during the incubation period may lead to a quarantine loophole', medRxiv, pp.
2020.03.06.20031955. DOI: 10.1101/2020.03.06.20031955.
Xiong, Y., Wang, Y., Chen, F., et al. (2020) 'Spatial Statistics and Influencing Factors
of the COVID-19 Epidemic at Both Prefecture and County Levels in Hubei
Province, China', International Journal of Environmental Research and Public
Health, 17(11). DOI: 10.3390/ijerph17113903.
Yang, W., Deng, M., Li, C., et al. (2020) 'Spatio-Temporal Patterns of the 2019-nCoV
Epidemic at the County Level in Hubei Province, China', International
Journal of Environmental Research and Public Health, 17(7). DOI:
10.3390/ijerph17072563.
Yuill, R. S. (1971) 'The standard deviational ellipse; an updated tool for spatial
description', Geografiska Annaler: Series B, Human Geography, 53(1), pp. 28-
39.
Zhai, W., Liu, M. Y., Fu, X. Y., et al. (2021) 'American Inequality Meets COVID-19:
Uneven Spread of the Disease across Communities', Annals of the American
Association of Geographers, 111(7), pp. 2023-2043. DOI:
10.1080/24694452.2020.1866489.
Zhang, S., Yang, Z., Wang, M., et al. (2021) '“Distance-Driven” Versus “Density-
Driven”: Understanding the Role of “Source-Case” Distance and Gathering
Places in the Localized Spatial Clustering of COVID-19—A Case Study of the
Xinfadi Market, Beijing (China)', GeoHealth, 5(8), pp. e2021GH000458.
DOI: https://doi.org/10.1029/2021GH000458.
Zhang, Y., Pan, Y., Zhao, X., et al. (2020) 'Genomic characterization of SARS-CoV-2
identified in a reemerging COVID-19 outbreak in Beijing's Xinfadi market in
2020', Biosafety and health, 2(4), pp. 202-205. DOI:
10.1016/j.bsheal.2020.08.006.
Zhou, C., Su, F., Pei, T., et al. (2020) 'COVID-19: Challenges to GIS with Big Data',
Geography and Sustainability, 1(1), pp. 77-87. DOI:
10.1016/j.geosus.2020.03.005.