ArticlePDF Available

G-DIF: A geospatial data integration framework to rapidly estimate post-earthquake damage


Abstract and Figures

While unprecedented amounts of building damage data are now produced after earthquakes, stakeholders do not have a systematic method to synthesize and evaluate damage information, thus leaving many datasets unused. We propose a Geospa-tial Data Integration Framework (G-DIF) that employs regression kriging to combine a sparse sample of accurate field surveys with spatially exhaustive, though uncertain , damage data from forecasts or remote sensing. The framework can be implemented after an earthquake to produce a spatially-distributed estimate of damage and, importantly, its uncertainty. An example application with real data collected after the 2015 Nepal earthquake illustrates how regression kriging can combine a diversity of datasets-and downweight uninformative sources-reflecting its ability to accommodate context-specific variations in data type and quality. Through a sensitivity analysis on the number of field surveys, we demonstrate that with only a few surveys, this method can provide more accurate results than a standard engineering forecast.
Content may be subject to copyright.
G-DIF: A geospatial data integration framework
to rapidly estimate post-earthquake damage
Sabine Loosa)
,M.EERI, David Lallemantb)
,M.EERI, Jack Bakera)
,M.EERI, Jamie
, Sang-Ho Yunc)
, Nama Budhathokid)
, Feroz Khanb)
, Ritika Singhd)
While unprecedented amounts of building damage data are now produced after
earthquakes, stakeholders do not have a systematic method to synthesize and evalu-
ate damage information, thus leaving many datasets unused. We propose a Geospa-
tial Data Integration Framework (G-DIF) that employs regression kriging to com-
bine a sparse sample of accurate field surveys with spatially exhaustive, though un-
certain, damage data from forecasts or remote sensing. The framework can be im-
plemented after an earthquake to produce a spatially-distributed estimate of damage
and, importantly, its uncertainty. An example application with real data collected
after the 2015 Nepal earthquake illustrates how regression kriging can combine a
diversity of datasets–and downweight uninformative sources–reflecting its ability to
accommodate context-specific variations in data type and quality. Through a sensi-
tivity analysis on the number of field surveys, we demonstrate that with only a few
surveys, this method can provide more accurate results than a standard engineering
From rapid engineering forecasts to crowdsourced maps, unprecedented amounts of building
damage data are now being produced after earthquakes. The 2010 Haiti earthquake was the
first time that response and recovery stakeholders had access to this amount of damage data,
due to both technological advancements in remote sensing data acquisition and mandates to
make that data openly available after major disasters (Corbane et al., 2011; Kerle and Hoffman,
2013). In fact, after 2010 there was a spike in the number of damage-related maps posted on
ReliefWeb—a global information sharing site devoted to humanitarian disasters—in response to
a)Stanford University, Stanford, CA 94305
b)Earth Observatory of Singapore, Nanyang Technological University, Singapore
c)Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109
d)Kathmandu Living Labs, Kathmandu, Nepal
e)Institute for Environmental Decisions, Dept. Environmental Systems Science, ETH Z¨
urich, Z¨
urich, Switzer-
major earthquakes despite having similar estimated economic damages as earlier events (Figure
Figure 1. The number of damage-related maps posted on ReliefWeb, a disaster information sharing
site, has increased since the 2010 Haiti earthquake. We would expect a similar number of maps for
major events with similar estimated economic damages (shown in 2019 USD). The number of maps
were scraped from ReliefWeb and economic damages were retrieved from EM-DAT (United Nations
Office for the Coordination of Humanitarian Affairs, 2019; Universit´
e catholique de Louvain (UCL) -
CRED and Guha-Sapir)
Counterintuitively, the increase in data is problematic since stakeholders–such as affected
governments, multilateral donor organizations, and humanitarian organizations–receive a bar-
rage of information and maps with unverified competing damage estimates (Kerle, 2013). Often,
data from new and untested methods are left unused when decisions need to be made quickly
(Hunt and Specht, 2019). Stakeholders do not have a systematic method to quickly assess the
accuracy or synthesize these data sources. Furthermore, it is common for damage to be quanti-
fied using metrics that are not usable for stakeholders to make crucial decisions within weeks of
an earthquake (Bhattacharjee et al., 2018). For example, in as little as two weeks, the affected
government uses damage data to estimate total losses for the Post Disaster Needs Assessments
(PDNA) to request recovery aid. It is unclear how to 1) translate multiple remotely-sensed
damage maps that show damage intensity per pixel, like the maps shown in Kerle and Hoffman
(2013), to usable metrics to estimate loss and 2) know which map is most accurate. If damage
estimates are inaccurate in the PDNA, the affected government could under or overestimate the
amount of aid requested—and subsequently distributed—for recovery. Because of these issues,
many damage data are left unused. This paper outlines a Geospatial Data Integration Frame-
work (G-DIF) to systematically integrate multiple sources of damage data into a single spatially
distributed estimate of damage with quantified uncertainty to ease decision-making and improve
the accuracy of post-earthquake damage estimates.
Integrating post-earthquake damage data is challenging since they are produced at differ-
ent times with varying geospatial coverages, formats, and levels of uncertainty. While a few
research studies have attempted to improve the accuracy of remote sensing and crowdsourced
damage data, none have developed generalized methods to combine multiple data sources into
a single, high-resolution, and spatially distributed estimate of building damage. For example,
Booth et al. (2011) used Bayesian analysis to update the ratio of collapsed buildings in an af-
fected area from manual assessments of satellite imagery with additional satellite assessments
and field surveys after the 2010 Haiti earthquake but produced collapse probability distributions
for four low-resolution land-use classes rather than high-resolution spatial estimates. Alterna-
tively, some studies treat post-earthquake damage data as inputs and validation for vulnerability
curves within an engineering forecast (e.g. Gunasekera et al., 2018; Huyck, 2015), but do not
update the final damage estimate itself. Rather than estimating damage, some studies have used
multiple damage data to develop maps of shaking intensity (e.g. Monfort et al., 2019). Finally,
Lallemant and Kiremidjian (2013) applied cokriging to integrate a crowdsourced assessment
with a set of field surveys, but this method was not generalized to incorporate multiple damage
data sources.
As opposed to existing methods, which rely on only one to two damage datasets, we pro-
pose a framework that is able to integrate multiple heterogeneous data sources to produce a
single spatial damage prediction in the weeks after an earthquake. Specifically, the geostatis-
tical model, regression kriging, implemented in G-DIF requires a limited sample of primary
damage data from field surveys, which are accurate but have low spatial coverage, to predict
damage using secondary damage data, which have lower accuracy but higher spatial coverage.
Within this framework, we employ a geostatistical integration method, since damage between
nearby buildings are likely correlated within the range of spatial correlation of ground motion
because of similarities in construction age and material, local soil conditions, and multiple other
factors (Shome et al., 2012). By modeling this spatial correlation parametrically, G-DIF does
not rely on large field survey samples as training data, unlike most machine learning models.
Therefore, instead of relying on a model that is built with training data from one location and
may not transfer well between different built environments and different data sources, G-DIF
can be be developed after an event using its specific data, leading to locally calibrated damage
estimates. Because of these features, similar geostatistical techniques have been previously ap-
plied to integrate data in other fields such as for mapping atmospheric optical thickness (e.g.
Chatterjee et al., 2010) and soil properties (e.g. Hengl et al., 2004; Thompson et al., 2010).
In this paper, we illustrate the implementation of the framework with an example application
using real damage data collected after the 2015 Nepal Earthquake. In this example, we show
how G-DIF produces a single map of damage and a map of the estimation uncertainty, which
can be used to model economic losses and guide further field surveying, respectively. Compared
to traditional methods of rapidly estimating post-earthquake damage, G-DIF results in a damage
estimate with lower overall error, higher resolution, and is specific to each context.
G-DIF makes use of two types of damage information: primary measurement data with high
accuracy and sparse spatial coverage, plus secondary proxy data with low accuracy and dense
spatial coverage. Examples of primary data include field surveys of damage and secondary data
includes engineering forecasts, remotely-sensed proxies, or relevant geospatial covariates from
before or after the event (e.g. intensity or elevation). All information is assumed to be numerical
(e.g. collapse rate) rather than descriptive (e.g. social media posts). In this section, we outline
the time of availability and format of the damage data suited for G-DIF, as shown in Figure 2.
Field surveys of damaged buildings are often conducted following earthquakes. These include
surveys conducted by reconnaissance teams to understand the scale and type of building dam-
age, rapid engineering safety evaluations to inform people of the safety of reoccupying build-
ings, and detailed, recovery-oriented surveys as time progresses (Earthquake Engineering Re-
search Institute, 2015; Lallemant et al., 2017). These field surveys include an evaluation of
the level of damage for each inspected building. The two most prevalent methods to assign
damage levels are the ATC-20 methodology and the EMS-98 grading system, where engineers
classify building damage in damage states or grades, respectively, based on descriptive dam-
age conditions (Applied Technology Council, 1989; Gr ¨
unthal, 1998). Since engineers inspect
Figure 2. Timeline of availability of post-earthquake damage data suited for G-DIF based on Lallemant
et al. (2017)’s review of damage assessments. Data sources with lower accuracy but dense spatial cov-
erage are available soonest after an earthquake. Once a limited sample of field surveys are collected,
enough data is available for G-DIF. The time to collect a sufficient amount of field surveys can vary by
region (in Nepal, it could feasibly be done in a couple of weeks), however, a couple of weeks is sufficient
for early recovery decisions.
each building from the ground, field survey assessments are the most accurate measurement
of damage relative to other damage data. The timing of early field surveys varies between
disasters—past examples from the REACH survey, the government, and reconnaissance teams
have shown organized surveys to be conducted in the first 6 weeks (Shelter Cluster Nepal, 2015;
Lallemant et al., 2017; Earthquake Engineering Research Institute, 2015). While full coverage
of on-the-ground surveys takes months to even years after a major event, G-DIF leverages these
early surveys to provide calibration of predictions and constraints at the survey locations.
Engineering forecasts are near-real-time predictions of regional impact available within hours,
as soon as a map of shaking intensity can be derived from the magnitude and location of the
earthquake source (Jaiswal et al., 2009). Multiple global systems exist, the most widely used
being the Prompt Assessment of Global Earthquakes for Response (PAGER) system (Jaiswal
and Wald, 2011). These systems typically use an analytical or empirical model that relates shak-
ing intensity to impact measures such as building damage, casualties, or economic loss. These
models usually rely on information on the earthquake shaking in terms of peak ground mo-
tion or intensity, building and population exposure, and fragility functions (Erdik et al., 2014).
While systems like PAGER aggregate their models to country-level impact estimates, alternative
systems, such as the Quake Loss Assessment for Response and Mitigation (QLARM), provide
spatially distributed model predictions (Trendafiloski et al., 2009). Since engineering forecasts
are model-based, rather than observation-based, these predictions are inherently uncertain, es-
pecially in regions with limited seismic stations and building inventory data (Wald et al., 2012;
Erdik et al., 2014).
Remote sensing-derived damage data are observations related to damage, retrieved from earth
observation technologies such as sensors mounted on satellites, aircraft, or unmanned aerial
vehicles. These signals can be interpreted automatically through computer algorithms or manu-
ally by humans, each with a range of formats (Dong and Shan, 2013; Kerle, 2013). Depending
on the interpretation method, the data are either damage proxies, which provide an idea of
damage intensity, or assessments, which provide direct measurements of damage. For exam-
ple, the Advanced Rapid Imaging and Analysis project at NASAs Jet Propulsion Laboratory
and California Institute of Technology produce damage proxy maps (DPM) for major disas-
ters based on an automatic change detection between two pairs of images from Interferometric
synthetic-aperture radar (InSAR) data, thus providing a measure of intensity (Yun et al., 2015).
Alternatively, digital humanitarian groups, such as Humanitarian OpenStreetMap Team (HOT)
or the Global Earth Observation-Catastrophe Assessment Network (GEO-CAN), have manually
identified damaged and collapsed buildings in optical satellite and aerial imagery, respectively
(Westrope et al., 2014; Loos et al., 2018; Ghosh et al., 2011). The availability of remote sensing-
derived damage data depends on the retrieval of the underlying remote sensing data—typically
within a few days to a couple of weeks (Dong and Shan, 2013; Lallemant et al., 2017). While
remotely sensing damage data have denser spatial coverage than field surveys, these estimates
have varying accuracy depending on the type of imagery or interpretation used (Loos et al.,
2018; Dong and Shan, 2013; Monfort et al., 2019).
Our goal is to estimate the true building damage, Z, which is the assigned damage grade for
a building from a field survey. We formulate the true damage as a function of location, s, so
Z(s)is a continuous variable. The region is discretized into a grid, so that Z(s)is defined at
a countable number of locations. When the grid dimension encompasses multiple buildings,
Zcan be defined as the average damage grade (hereon referred to as mean damage) of the
buildings or the fraction of buildings that fall within a given grade.
We consider the true damage as a random spatial process composed of two parts: 1) the
mean surface, which is the average damage throughout space and 2) small-scale fluctuations
around the mean surface. In the case of earthquake-induced building damage, the mean surface
will exhibit a general trend in space, because of characteristics such as shaking intensity that
have large-scale spatial variation. We model this trend parametrically. We expect the small-
scale fluctuations (hereon the residuals) to exist, resulting from smaller scale similarities in
characteristics such as construction characteristics and local soil conditions. Because of the
small-scale similarities, we model the residuals as stochastic and spatially auto-correlated, or
correlated with itself between two locations.
The true building damage Zat a single location s, can therefore be represented as the sum
of the trend, m(s), and stochastic residual, ε(s),
Z(s) = m(s) + ε(s).(1)
To illustrate, consider two communities Aand B—community Ais closer to the earthquake
source and experienced greater shaking, and therefore damage, than the more distant commu-
nity B. The average difference in damage between Aand Bis represented by the trend, m(s).
Beyond that, the buildings in the grids in and around Aare constructed similarly—built with
the same material in the same year—causing similar damage. The local similarities in damage
surrounding a grid is represented by the spatially correlated residual ε(s).
Note that Z(s)is defined as the true damage, since a field surveyed assessment is relatively
the most accurate measurement of damage available after an earthquake. Uncertainty in a field
survey still exists due to the subjectivity of the surveyor, and the additional uncertainty intro-
duced from aggregating the surveys to a grid. Here, however, we consider Z(s)to be exact
and only account for the uncertainty in the estimation of the trend and the spatially-correlated
G-DIF capitalizes on 1) the correlation between the sparse field surveys and secondary dam-
age data to estimate the trend and 2) the auto-correlation between the field surveys to estimate
the residuals. The geostatistical data integration model implemented in G-DIF is regression
kriging (also known as residual kriging), a multivariate geostatistical regression technique,
which consists of two separate models for the trend and the residuals (Odeh et al., 1994).
Separate modeling of the trend and residuals allows for alternative regressions that consider
nonlinear relationships between primary and secondary data and separate interpretation of each
model’s results. The main steps of the framework are in Figure 3.
Figure 3. G-DIF steps to produce spatial estimates of regional damage.
We separate the input data for G-DIF into two sets of locations. There are psecondary datasets,
X1. . . Xp, that are spatially exhaustive and available at all nlocations with an additional set of
primary field survey data at a subset of nfs locations. The collocated primary and secondary
data at the nfs field survey locations are used for developing a regression function, which is
then used to estimate the trend at all nlocations. Similarly, the spatial correlation model is
developed using the nfs field locations. Generally, the set of field surveys should be large
enough to build a regression model for the trend (nfs >> p) and have samples at each damage
level and varying distances from each other. In this paper, we assume that the set of field
surveys include observations of the full range of damage levels and are carried out at random
grids distributed throughout the spatial domain in order to produce unbiased estimates of the
trend and variogram (this assumption has important implications for survey sampling, which
we revisit in the sensitivity analysis and conclusion sections). The vector of field surveys (Z)
and matrix of secondary datasets (X) for model development are
Z(snfs )
X1(s1). . . Xp(s1)
.. . . .
X1(snfs ). . . Xp(snfs )
To model the trend, we develop a regression function, f, which predicts the true damage at
the field survey locations, Z, as a function of the damage from the secondary data, X. We use
the developed regression function to estimate the trend at a single, unknown location, s0:
ˆm(s0) = f(X(s0)).(2)
The function fis the modeler’s choice and will generally be earthquake-specific. Because the
choice of trend model is likely to be dependent on the data available, it is important to develop
this function manually to obtain accurate estimates of the final damage. It is common to ap-
ply ordinary least squares (OLS) for trend estimation. Alternatively, generalized least squares
(GLS), which weights observations by their spatial covariance, accounts for spatial correlation
in the residuals and leads to an unbiased estimate of the coefficient. The use of GLS leads to re-
sults most similar to estimating the trend and residual simultaneously, as with universal kriging
(Hengl et al., 2003; Chiles and Delfiner, 2012). In either formulation, both linear and nonlinear
least-squares regression functions can be applied. Other functions such as general additive mod-
els, regression trees, and artificial neural networks have also been explored within this general
approach (McBratney et al., 2000; Grujic, 2017; Motaghian and Mohammadi, 2011). In addi-
tion, separate trend models can be developed for different regions that have varying coverage
of secondary data. This could be the case for imagery-based damage data that can be limited in
geographical extent, which we demonstrate in our application to Nepal.
With the developed trend function we estimate the trend at all nlocations and calculate the
residuals at each of the nfs field surveyed locations:
ε(sα) = Z(sα)ˆm(sα), for α= 1...nfs.(3)
Using the calculated residuals, we perform ordinary kriging to estimate the residuals at the un-
known locations using a spatial correlation model. The estimated residual at a single, unknown
location is the weighted sum of the known residuals from the field surveyed locations
ˆε(s0) =
where λαare the kriging weights.
We solve for the kriging weights, λ
λ=λα. . . λnfs , by minimizing the estimation variance at
the surveyed locations and placing a constraint on the sum of the weights to equal one to satisfy
the unbiasedness conditions assumed with ordinary kriging (Chiles and Delfiner, 2012).
varε(sα)ε(sα)) + 2ν(
We obtain the λ
λthat minimizes Equation 5 by introducing a Lagrange multiplier νand setting
the function’s partial derivatives with respect to λ
λand νequal to zero. This results in the
following ordinary kriging system of nfs + 1 equations with nfs + 1 unknowns (λ
λand ν):
nfs ×nf s
nfs ×1
nfs ×1
nfs ×1
where Cis the auto-covariance matrix between the known residuals and C0is the covariance
between the new estimation location and all field survey locations. Here, we assume second-
order stationarity of the residuals, meaning the autocovariance is the same for any two points
based on their separation distance, h, and irrespective of their location. The auto-covariance C
is derived from a variogram, a concept similar to the correlation models used for ground-motion
intensities (Boore et al., 2003; Goda and Hong, 2008; Jayaram and Baker, 2009). The variogram
is a theoretical parametric model of spatial correlation that relates the separation distance h
between field surveyed locations and the dissimilarity of their residuals. Dissimilarity in the
variogram is quantified using half the variance, or the empirical semivariance
γ(h) = 1
where his the euclidean distance. A theoretical variogram is then fit through all (γ, h)pairs.
Selection of an appropriate theoretical variogram should again be based on the lowest error from
cross-validation (Oliver and Webster, 2014).
The final damage estimate at a single location is obtained by adding together the estimated trend
and residuals from Equations 2 and 4, respectively, as shown in Equation 1
Z(s0) = f(X(s0)) +
Once we develop the final damage estimate for all locations, ˆ
Z, it can be used to estimate further
decision variables (i.e. the spatial distribution of economic losses).
In addition, this method provides the variance of the damage estimate, ˆσ2(s0), which is the
sum of the individual variances from estimating the trend, ˆσ2
m(s0), and kriging the residuals,
ˆσ2(s0) = ˆσ2
m(s0) + ˆσ2
The estimation variance can be used to propagate uncertainty in further loss estimates or to
guide where to carry out additional field surveys.
In this section, we demonstrate the applicability of G-DIF by using real data produced after
the 2015 Mw7.8 Nepal earthquake to estimate damage over the 11 heavily affected and mostly
rural districts outside of Kathmandu Valley. We assume this model would have been applied
approximately two to four weeks following an earthquake (i.e., the vertical line in Figure 2)
when enough field surveys are available to implement G-DIF. For this example, we use field
surveys at 100 random locations plus representative data sources for each type of secondary
damage data. We present this case study in order of the flowchart of Figure 3.
The measurement unit and spatial support of each input data used in this case study are listed in
Table 1.
Table 1. Data from the 2015 Nepal earthquake used in the application of G-DIF
Damage data category Dataset used in case study Measurement Unit Spatial Support
Field surveys EMS-98 field surveys (Z) Damage grade Building-level
Engineering forecast Self-developed (X1) Mean damage ratio 1km grid
Remote sensing proxy InSAR-based damage proxy map (X2) Damage proxy map value 30m grid
Relevant geospatial covariates ShakeMap (X3) Modified Mercalli Intensity 1.75km grid
Digital Elevation Model (X4) Elevation (m) 90m grid
The damage survey data for this case study come from the Earthquake Housing Damage and
Characteristics Survey commissioned by the Government of Nepal and completed by July 2016
( The purpose of that survey was to identify rural households that
would be eligible beneficiaries for the Earthquake Housing Reconstruction Program and was
therefore carried out in the 11 rural most-affected districts, not including the three districts in
Kathmandu Valley (Nepal Earthquake Housing Reconstruction Multi-Donor Trust Fund, 2016).
In this survey, trained engineers used the EMS-98 damage grading system to classify a census
of 751,799 buildings in these districts into a damage grade from 1 (negligible to slight damage)
to 5 (collapse). While this exhaustive survey was completed a year after the earthquake, we
consider only a random sample of 100 locations in order to replicate what would be available
rapidly after an event.
We developed an engineering forecast dataset with similar methods and quality to engineer-
ing forecasts available after earthquakes in countries with limited building inventory data. We
use fragility curves from Nepal’s National Society of Earthquake Technology to relate the peak
ground acceleration from the latest ShakeMap to damage ratios for masonry (mud and cement
mortared), reinforced concrete, and wood structures (JICA, 2002; Worden et al., 2018). The ex-
posure is defined using population estimates from the LandScan 2011 High Resolution Global
Population Dataset and ratios of each construction type available at the district-level in Nepal’s
2011 census (Bright et al., 2012). Given the estimated number of buildings, the estimated distri-
bution of each construction type, and the fragility curve for each construction type, we compute
the mean damage ratio per grid.
For the remote sensing proxy, we use NASA’s damage proxy map (DPM) (Yun et al., 2015).
NASA has consistently produced a DPM after major disasters since the February 2011 M6.3
Christchurch earthquake, making it a relevant remote sensing proxy to include in this study.
The DPM algorithm takes the difference between two InSAR coherence (or similarity) maps:
one from before the earthquake and one spanning the earthquake. The DPM value in each
pixel (which ranges -1 to 1) represents anomalous change due to the earthquake, as opposed to
background changes (noise) that existed in the pre-earthquake pair coherence.
We also consider two geospatial covariates that are available after earthquakes and relate to
the trend in damage: the Modified Mercalli Intensity from the ShakeMap (Worden and Wald,
2016), and a Digital Elevation Model (DEM) derived from the Shuttle Radar Topography Mis-
sion (Jarvis et al., 2008; Farr et al., 2007). While elevation may not directly cause earthquake
damage, it could serve as a proxy for other factors such construction quality in remote areas or
landslide occurrence. The use of elevation data for the application of G-DIF in Nepal demon-
strates how the trend model down-weights secondary datasets that are poor proxies for damage,
as shown in the modeling results for the trend (Section 4.3).
We discretize each dataset to a common grid of 0.0028×0.0028(290m ×290m), resulting
in a study area with 80,200 grid points. We use this resolution to remove any personal identifi-
able information, ensuring that more than one building is within each grid. The 11 considered
districts are mostly rural, so there are nine buildings per grid on average (though 0.25% have
100 or more buildings).
The true damage from the field surveys, Z, is the mean damage grade of all buildings within
each grid. Out of 80,200 grids that contain buildings, we randomly selected 100 grids (contain-
ing 1056 buildings) as the set of locations that engineers could survey in the field. From here
on, we refer to the subset of grids as the field surveyed locations.
Exploratory analysis shows a positive relationship between the true damage and the sec-
ondary damage data as exhibited in the moving average curves in the left column of the matrix
in Figure 4. Specifically, the engineering forecast, shaking intensity, and elevation are linearly
related to the true damage, while the DPM shows a slightly nonlinear relationship. The form
of these discovered relationships should be considered when deciding on which trend model to
Figure 4. Summary of true damage from primary field survey and secondary damage data at all loca-
tions (n= 80,200) and the subset of field surveyed locations (np= 100). The diagonal shows histograms
of each dataset, the scatter plots show relationships between datasets (including a moving average esti-
mate), and the bottom row maps the spatial patterns of each data set (with warm colors indicating larger
values). The left column of scatter plots highlights relationships between primary and secondary data.
Based on our observations of linearity between variables, we used a linear least squares regres-
sion as the functional relationship between the true damage and each secondary damage data
ˆm(s0) =
where X0is a vector of ones to estimate the intercept. We estimate the coefficient for each sec-
ondary damage data, ˆ
βk, through either ordinary least squares (OLS) regression or generalized
least squares (GLS) regression. We select the regression function that results in the least root
mean squared error, which is OLS regression in this example. We also build two trend models
for areas with and without DPM values, since the DPM covers about 40% of the considered
region (more details are in the Appendix). We compute the variance inflation factor (VIF) for
each secondary data variable to assess whether multicollinearity exists (James et al., 2013), but
find that the VIF’s for each variable are below two–a low value that indicates multicollinearity
is not a problem with these data.
Building a trend model with the data at the field surveyed locations has two advantages.
First, the function in Equation 10 translates the numerous secondary damage data with differing
units of measurement (e.g., shaking amplitude, elevation, an arbitrary numerical scale for DPM)
into a collective unit, the mean damage grade, that has value for regional loss estimates and other
The second advantage of the trend model is that the modeler does not need to subjectively
weight the importance of each secondary damage data, instead allowing the data to determine
the importance of each dataset through the model coefficients. By examining each ˆ
βkand its
standard error, we observe which secondary dataset provides additional value in modeling the
trend. For example, in Figure 5a, we see the digital elevation model (DEM) has close to a
zero coefficient in the estimated trend, signifying the DEM has little additional effect on the
trend estimate when we account for the other secondary damage data. These coefficients are
comparable since we normalize all variables before developing the trend model. If we estimate
a zero coefficient for all secondary damage data, then the trend reduces to a constant mean (the
intercept). Note that the estimated coefficients shown in blue in Figure 5a are dependent on
the set of field surveyed grids and therefore may differ from the true coefficient estimates as
shown in black. These coefficient estimates are also specific to the Nepal earthquake, which
was a largely rural disaster; it is not a comment on the general utility of each dataset among all
earthquakes. Because the parameters of the trend model are based on its data inputs at the field
surveyed locations, G-DIF is calibrated to the data available after each specific earthquake.
(a) (b)
Figure 5. (a) The coefficient estimates (blue dots) from the trend model using ordinary least squares
regression in the area with Damage Proxy Map values. Horizontal lines show the standard error and
black stars are coefficients using 10000 grids. (b) The spatial correlation model using a Matern vari-
ogram showing the difference in the variogram of the true damage at the field surveyed grids and of the
variogram of the residuals before and after removing the trend, respectively. The vertical dotted line at
9.4 km highlights the range of spatial autocorrelation.
Similar to the trend model, the parameters of the spatial correlation model are calibrated to the
data rather than predetermined. In this case study, we estimate the parameters of a Matern the-
oretical variogram model. The fitted parameters when minimizing the residual sum of squares
results in an exponential covariance:
C(h) = bexp |h|
where bis equal to the variance of the residuals and ris the range of spatial autocorrelation. In
this example, we fit these parameters to equal b= 0.83 and r= 9.4km.
The variogram is related to the covariance through
γ(h) = bC(h).(12)
We verified the use of this model by comparing the variogram fitted with 100 field surveyed
grids shown in Figure 5b to the same model fit with 10,000 grids.
The shape of the variogram highlights spatial characteristics of the data. The vertical dot-
ted line in Figure 5b is the range of spatial autocorrelation, r, of the true damage at the field
surveyed locations after removing the trend. The range is the maximum distance at which two
locations are spatially autocorrelated with one another. The estimated range of 9.4km is spe-
cific to this earthquake and depends on the choice of variogram and fitting procedure, which
we evaluate further in the sensitivity analysis in the following section. We evaluate the statisti-
cal robustness of the estimated range in the following section. The variogram also shows that
we have successfully removed the preexisting trend from the data, since the variogram of the
true damage increases with distance, while the variogram of the residuals plateaus. If the trend
model were able to fully capture the spatial correlation in the true damage, the variogram would
reduce to a horizontal line with γ(0) = σ2(0) (i.e. the nugget-effect model), and performing
ordinary kriging would provide no additional effect on the final mean damage estimate. There-
fore, G-DIF adapts to allow varying levels of contribution from the spatial correlation model,
depending on how well the trend model estimates the true damage.
The implementation of G-DIF generates two main outputs: 1) a map of the mean damage
estimate for each of the 80,200 grids and 2) a map of uncertainty, or estimation variance, of
those estimates.
Damage estimate
The mean damage estimate map (6a) is the sum of the estimated trend and the estimated resid-
uals. The mean damage estimate reflects the trend model such that areas to the north exhibit
greater damage than areas to the south. This gradient in damage comes from the two most
important secondary damage data in the trend model, the shaking intensity and the engineering
forecast, which have higher values towards the north.
The mean damage estimate reflects the spatial correlation model through the similarity in
mean damage estimates surrounding the field surveyed locations shown in black in Figure 6a.
These spatial similarities are particularly visible to the northeast of Kathmandu, where there is
clustering of high damage around the field surveyed points. These similarities are due to the
variogram, which estimates small-scale fluctuations based on damage at nearby field surveyed
This map can then be used to estimate total costs of damage. Here, we assume the estimated
mean damage grade is the same for all buildings within a grid and that the number of buildings
per grid is known. By multiplying the estimated mean damage grade by the number of build-
ings, an assumed ratio of each type of construction material, and the repair or reconstruction
cost for each construction material, we obtain an estimate of 315 billion NPR (2.8 billion USD)
for the cost of repair and reconstruction. This estimate is almost the same as the total damages
to the housing sector of 303 billion NPR (2.7 billion USD) reported in the PDNA. While this
economic loss estimate is not a primary focus of this study, it is provided here to illustrate that
these damage predictions can be converted to regional economic loss estimates.
Estimation variance
The map of the estimation variance is the sum of the variance from estimating both the trend and
residuals (Equation 9). In the case of least squares regression and ordinary kriging, we solve
for the trend coefficients (β
β) and kriging weights (λ
λ) by minimizing the variance of the error at
the field surveyed locations. These two procedures result in an estimation variance of the trend
and residuals at all locations (specific equations are included in the Electronic Supplement).
We can interpret the estimation variance, σ2(s), as our uncertainty in the mean damage es-
timate at each grid. The model assumes the uncertainty in the mean damage estimate varies
according to a Gaussian probability distribution with ˆ
Z(s)as the mean and ˆσ2(s)as the stan-
dard deviation—Z(s)N(ˆ
Z(s),ˆσ(s)). The variance quantifies the uncertainty in 1) the trend
estimation due to the relationships between the primary and secondary data and 2) spatial es-
timation of the residuals. The spatial uncertainty is visible in the estimation variance map as
shown by the higher variances at grids that are located further from the field survey locations in
Figure 6b.
The map of the estimation variance can be used to guide where future field surveys should
be carried out and to propagate uncertainty when estimating further losses. Since the variance
depends on the location of the field surveyed grids, surveyors could assess damage in areas with
higher variance to reduce the overall uncertainty.
In this section, we compare G-DIFs spatially varying damage estimate and variance to the engi-
neering forecast, which is the current standard of practice for estimating post-earthquake dam-
age. Visually, we can see that G-DIFs mean damage estimate from the example set of 100
field surveyed grids presented in the previous section (Figure 6a) resembles the true damage
(a) G-DIF mean damage estimate (b) G-DIF estimation variance
(c) True damage (d) Engineering Forecast
Figure 6. Results of the framework for an example set of 100 field surveyed locations including (a) the
mean damage estimate and (b) the estimation variance. The results of (a) can be compared to (c) the true
damage from all field surveys and (d) the engineering forecast converted to mean damage grade.
(Figure 6c) more than the engineering forecast (Figure 6d). Going further, we quantify the per-
formance of G-DIFs outputs to demonstrate that its mean damage estimate has lower total error
and improved uncertainty quantification. Since G-DIF heavily depends on the field survey data,
we also perform a sensitivity analysis of G-DIFs outputs to the number and placement of field
surveyed locations used to build the model.
Using the mean damage estimate from the previous case study, we quantify the error between
the predicted and observed damage at all validation grids, as shown in Figure 7a. The distribu-
tion of prediction error highlights 1) the bias, or the mean error—whether the damage estimate
is systematically under or overestimating damage, and 2) the variance, how precise the damage
estimate is for all grids. Note that the engineering forecast only results in mean damage grade
values that are whole integers (1, 3, and 4) after binning the predicted mean damage ratio per
grid, leading to spikes in its errors at whole integers in Figure 7a. The lower bias and variance
of G-DIFs mean damage estimate leads to a mean squared error (MSE), a performance metric
which combines both bias and variance, of MSE = 0.853—47% lower than the engineering
forecast with MSE = 1.62.
Figure 7. Histogram of errors between the predicted and observed damage for a) all 11 considered
districts, b) Makawanpur district (southwest of Kathmandu Valley), and c) Nuwakot district (northwest
of Kathmandu Valley). Histograms highlight the lower bias, variance, and mean squared error for G-DIF
when using an example set of 100 field surveyed grids.
Even though G-DIFs MSE is nearly half that of the engineering forecast, we see the dif-
ference is even larger when looking at individual districts within the study area (Figure 7b and
c). Over the full study area, the engineering forecast will capture the overall trend in damage
over large regions, but is limited by the resolution of the underlying building inventory data,
which is only available at the district-level. G-DIFs advantage over the forecast is that it is
locally calibrated to field surveys within a district, so its mean damage estimates for smaller
regions have lower bias, variance, and MSE. For example, G-DIFs mean damage estimate has a
lower bias (bias = 0.038) and higher precision (standard deviation = 1.122) than that of the engi-
neering forecast (bias = 0.904, standard deviation = 1.477) when considering the errors only for
Makawanpur, the district directly southwest of Kathmandu Valley (Figure 7b). G-DIF is consis-
tently more accurate at the local-level for 9 out of the ll districts, as seen in the error histogram
for Nuwakot in Figure 7c and the other districts depicted in the electronic supplement.
G-DIFs final mean damage estimate varies depending upon the sampled primary data, especially
with few field survey grids to build the trend and spatial correlation models or with secondary
data that are not strongly predictive of damage. The goal of this section is to quantify how
G-DIF’s performance depends on the number and placement of the field surveyed locations
used to build the framework. For a set of field surveyed locations ranging from 25 to 1000, we
simulate G-DIFs mean damage estimate using 1000 random samples of different placements
and assess its performance. Figure 8 shows the distribution of the MSE and bias for each of
these simulations.
(a) (b)
Figure 8. Histograms of (a) the accuracy of the mean damage estimate (MSE) and (b) the performance
of the estimation variance from the sensitivity analysis of the number and placement of field surveyed
locations used to develop G-DIF. As more field surveys are collected, the accuracy improves and does
not depend as much on the placement of field surveyed locations.
As expected, as the number of field surveyed locations increases, the G-DIF MSE decreases
(accuracy increases) and is consistently lower than that of the engineering forecast, regardless
of the placement of the field surveyed locations. Given that the MSE can take values between 0
and 16, the MSE from G-DIF is relatively low. Figure 8a shows histograms of MSE for repeated
analyses using varying samples of data, and for five amounts of sampled data. G-DIF MSE is
lower than that of the engineering forecast for 99.7 % of the simulations when using 50 field
surveyed grids, and the percentage is even higher when more survey locations are used.
When we separate the bias from the MSE, we see that G-DIF distribution of bias is low
relative to the full range of possible bias (-4 to 4). The G-DIF damage prediction can result in
a more biased result than the engineering forecast, as shown by the areas of the distributions of
bias outside of the vertical bounds of the engineering forecasts mean error in Figure 8b. This
is partly due to the fact that G-DIFs mean damage estimate depends on how representative the
field survey set is of the true distribution of damage. With more biased field survey sets, the final
estimate is more biased, but sample bias can be avoided with a sufficiently large field survey
if the field survey comes from a random sample. With 500 field survey locations, 86% of the
simulations are less biased than the engineering forecast.
G-DIF is more biased than the engineering forecast, also because the forecast has a low
mean error of -0.056 for the full study region, as discussed in the previous section. However,
the sensitivity analysis confirms G-DIF is more precise when considering sub-regions. Since
the MSE is the sum of the variance and squared bias, the reduction in MSE with more field
surveyed locations in Figure 8a is due to the reduction in variance of the error. This reduction
in variance means that grid-level estimates become more precise. So overall, there is lower
variation in the error considering the high resolution of G-DIF’s mean damage estimate.
We also evaluated the statistical robustness of the estimate of the range of spatial autocorre-
lation. After 1000 simulations using 1000 field surveyed locations, the range of the unexplained
damage is on average 14 km. This range is consistent with the range of 15-20km reported for
damage ratios from the 1994 Northridge earthquake (Shome et al., 2012) and plausible given
that the range of spatial correlation for ground motion intensities can vary between 10-60km
(Jayaram and Baker, 2009).
By comparing to the engineering forecast, we show that G-DIF provides a credible damage
estimate to support post-earthquake decisions. Whether G-DIF is advantageous over using tra-
ditional methods to rapidly estimate damage depends on the amount of primary field data, the
quality of the secondary data, and the scale at which decisions are made. In cases where a
well-calibrated engineering forecast is available, the study region is large, or there are few field
surveyed grids, the engineering forecast will provide reasonable damage estimates.
Often, however, the engineering forecast may be a general model rather than one calibrated
for the specific region, or the input inventory data may be of low quality and resolution. In such
cases, if there are sufficient field surveys available, G-DIF will likely provide a damage estimate
that is comparable or have higher accuracy than that of an engineering forecast. This is because
of the approaches’ ability to calibrate an event-specific prediction and to also perform spatial
interpolation between survey points. In this formulation, we consider measurements from the
field to be exact
A main advantage of G-DIF is that it provides locally accurate damage estimates that can be
leveraged for loss estimates and higher resolution decisions. This means that within sub-regions,
G-DIFs damage estimate will calibrate the engineering forecast, and other secondary damage
data, to the field surveys within that region (as seen in the error histogram for Makawanpur
district in Figure 7b). To improve the local accuracy of the damage estimate, surveyors can use
the uncertainty estimate to guide the collection of additional damage assessments. By surveying
in areas with greater uncertainty, the overall uncertainty of the damage estimate will decrease.
In this study, we propose a geospatial data integration framework (G-DIF) to produce a spatial
damage prediction in the weeks after an earthquake. G-DIF uses a limited sample of local and
accurate field surveys to calibrate predictions based on heterogeneous and uncertain damage
data from engineering forecasts, remote sensing and other sources. The uncertain data can
arrive in varying formats, measurement units, and levels of accuracy.
The geostatistical technique, regression kriging, applied in G-DIF consists of two models.
The first is a trend model that estimates the mean damage, a deterministic value that varies
in space, using secondary damage data. The second is a spatial correlation model that esti-
mates the stochastic and spatially correlated residuals between the estimated trend and the true
damage. The separate modeling of these two components allows the framework to produce a
sophisticated trend model when the secondary data is strongly predictive, plus a spatial inter-
polation between observations when the secondary damage data has less predictive power. The
framework is flexible to implement—the modeler can choose the functional form of the trend
prediction (linear or nonlinear) and spatial correlation (variogram) model, depending on the
data available for the event of interest.
Data collected after the 2015 Nepal earthquake was used to demonstrate the implementation
of G-DIF. Out of 80,200 grids in our area of interest, we used a sample of 100 grids as an
example of field surveyed locations and found that the mean damage estimated at the other
80,100 grids had a higher accuracy (lower mean squared error) than a benchmark based on a
current engineering forecast. Moreover, G-DIF provides a mean damage estimate that is more
accurate for smaller regions than the engineering forecast used in this study, because it locally
calibrates all secondary data to field surveys. Modelers can then use this spatially varying mean
damage estimate to calculate costs of repair and reconstruction.
In addition to the mean damage estimate map, G-DIF creates a map of the estimation un-
certainty, which is important for interpreting results, and a significant addition to the current
state of practice for standard damage maps from engineering forecasts or remote sensing dam-
age data (e.g. Jaiswal and Wald, 2011; Yun et al., 2015; Copernicus Emergency Management
Service, 2019). Post-disaster modelers or decision-makers can use this estimation variance to
propagate uncertainty into further impact models or decide where to collect more field surveys
to reduce the uncertainty.
With this method we do not explicitly account for uncertainty in the field surveyed assess-
ment that results from survey subjectivity and aggregation per grid. The subjectivity in the
field surveyed measurement has the potential to be mitigated Booth et al. (2011), so we have
made the assumption that its uncertainty is negligible relative to other data sources. This frame-
work can be extended to address the uncertainty due to aggregation through Bayesian updating
of the damage estimate per grid, similar to that presented in Booth et al. (2011), though this
would require estimates of prior and posterior distributions for each dataset and would be more
computationally intensive.
With even a small amount of field survey data, G-DIF predictions have improved accuracy
relative to standard engineering forecasts. Through Monte Carlo simulations of the number and
locations of field surveys, we found that G-DIF consistently resulted in a damage map with
lower mean squared error than an engineering forecast when using more than 50 field surveyed
locations. Given that we predict the damage at 80,150 grid locations using 50 field surveyed
locations, our framework required 0.06% percent of the grids to be surveyed to improve the
estimate of an engineering forecast. In the case of Nepal, 50 field surveyed locations could con-
tain between 250-1150 buildings, which could be feasibly assessed within the first 2-4 weeks in
remote, mountainous contexts. While this timeframe may seem long, a few weeks is a sufficient
amount of time for this approach to inform important decisions, such as the PDNA which is a
major use case.
While our results show an improved mean damage estimate with a small percentage of
field surveyed buildings, the placement of field surveys influence these results. Through the
sensitivity analysis of the framework to the field surveyed locations, we found G-DIF’s mean
damage estimate depends on how well the field survey set represents the full damage distri-
bution. A biased set of field surveyed locations can lead to biased results—in the case of the
Nepal earthquake, sets of more than 500 grids were less likely to be biased. To develop the
spatial correlation model at low separation distances, the field survey set should also consist of
locations within the spatial correlation range. To collect field data suited for G-DIF, surveys
can be strategically placed to collect damage assessments for all buildings within selected grids
so the sample has the full distribution of damage and sufficient spatial coverage, similar to the
methods of the REACH survey (e.g. REACH, 2014).
The advantage of G-DIF over standard damage estimates, such as the engineering forecast,
is apparent from the Nepal case study. The Nepal earthquake affected a large, mostly rural,
region over multiple districts. In this case, secondary data was uncertain because the engineer-
ing forecast was developed using low-fidelity data and the damage proxy map was observing
changes to both the built environment and vegetation. We expect many future earthquakes to be
similar in that there will be a limited sample of accurate field data to calibrate damage predic-
tions from multiple uncertain data. Therefore, the framework presented here could be extended
through testing with earthquakes occurring in different built environments or even other types
of disasters, as suggested in (Shome et al., 2012).
Overall, the outputs of this framework are useful for stakeholders involved in post-disaster
loss assessments (like the PDNA) or recovery aid allocation, such as the affected national gov-
ernment, multilateral or bilateral donor agencies, or civil society organizations. In post-disaster
settings, these stakeholders are often overloaded with making many decisions based on the
uncertain data that are available at that time. By combining multiple data, this framework auto-
matically weights those damage datasets according to their ability to predict damage observed
in the field surveys, and synthesizes them to develop one map of damage. Therefore, the frame-
work allows stakeholders to address the hurdle of weighing the reliability of input data versus
its availability, so they can ultimately make more informed decisions to for a more effective
regional recovery.
The data and R code to develop all results for the Nepal case study example presented in this
paper are available at with an interactive notebook of the
code at nb.html.
We would like to thank Anna Michalak, David Wald, Kishor Jaiswal, Brendon Bradley, and
Robert Soden for their contributions and feedback developing this framework. We would like to
thank the Government of Nepal, especially the National Planning Commission, Central Bureau
of Statistics and National Reconstruction Authority, for collecting this groundtruth damage data
and making its anonymized version available for broader uses and Arogya Koirala and Roshan
Paudel for their assistance in preparing this data. Part of the research was carried out at the Jet
Propulsion Laboratory, California Institute of Technology, under a contract with the National
Aeronautics and Space Administration. This work is funded by the National Science Founda-
tion Graduate Research Fellowship Program, the National Research Foundation of Singapore
grant NRF-NRFF2018-06, and the World Banks Trust Fund for Statistical Capacity Building
(TFSCB) with financing from the United Kingdom’s Department for International Development
(DFID), the Government of Korea, and the Department of Foreign Affairs and Trade of Ireland.
Applied Technology Council, 1989. Procedures of Postearthquake Safety Evaluation of Buildings.Tech.
rep., Applied Technology Council, Redwood City, CA.
Bhattacharjee, G., Barns, K., Loos, S., Lallemant, D., Deierlein, G., and Soden, R., 2018. Developing
a User-Centric Understanding of Post-Disaster Building Damage Information Needs. In 11th U.S.
National Conference on Earthquake Engineering. Los Angeles, CA.
Boore, D. M., Gibbs, J. F., Joyner, W. B., Tinsley, J. C., and Ponti, D. J., 2003. Estimated Ground
Motion From the 1994 Northridge , California , Earthquake at the Site of the Interstate 10 and La
Cienega Boulevard, West Los Angeles, California. Bulletin of the Seismological Society of America
93, 2737–2751.
Booth, E., Saito, K., Spence, R., Madabhushi, G., and Eguchi, R. T., 2011. Validating Assessments of
Seismic Damage Made from Remote Sensing. Earthquake Spectra 27, S157–S177.
Bright, E. A., Coleman, P. R., Rose, A. N., and Urban, M. L., 2012. LandScan 2011. Oak Ridge National
Chatterjee, A., Michalak, A. M., Kahn, R. a., Paradise, S. R., Braverman, A. J., and Miller, C. E., 2010.
A geostatistical data fusion technique for merging remote sensing and ground-based observations of
aerosol optical thickness. Journal of Geophysical Research 115, 1–12. doi:10.1029/2009JD013765.
Chiles, J.-P. and Delfiner, P., 2012. Geostatistics: Modeling Spatial Uncertainty. 2 edn. Wiley Series in
Probability and Statistics, New York, NY. ISBN 978-0471083153, 734 pp.
Copernicus Emergency Management Service, 2019. Rapid Mapping Portfolio.
Corbane, C., Saito, K., DellOro, L., Bjorgo, E., Gill, S., Boby, P., Huyck, C., Kemper, T., Lemoine, G.,
Spence, R., Shankar, R., Senegas, O., Ghesquiere, F., Lallemant, D., Evans, G., Gartley, R., Toro, J.,
Ghosh, S., Svekla, W., Adams, B., and Eguchi, R. T., 2011. A Comprehensive Analysis of Building
Damage in the 12 January 2010 Mw7 Haiti Earthquake Using High-Resolution Satellite and Aerial
Imagery. Photogrammetric Engineering Remote Sensing 77, 997–1009.
Dong, L. and Shan, J., 2013. A comprehensive review of earthquake-induced building damage detection
with remote sensing techniques. ISPRS Journal of Photogrammetry and Remote Sensing 84, 85–99.
Earthquake Engineering Research Institute, 2015. Learning From Earthquake (LFE) Program.Tech.
rep., Earthquake Engineering Research Institute, Oakland, CA.
Erdik, M., Sesetyan, K., Demircioglu, M., Zulfikar, C., Hancilar, U., Tuzun, C., and Harman-
dar, E., 2014. Rapid Earthquake Loss Assessment After Damaging Earthquakes. In Geotechni-
cal, Geological and Earthquake Engineering, vol. 34, pp. 53–96. ISBN 9783319071176. doi:
Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Ro-
driguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank,
D., and Alsdorf, D. E., 2007. The shuttle radar topography mission. Reviews of Geophysics 45.
Ghosh, S., Huyck, C. K., Greene, M., Gill, S. P., Bevington, J., Svekla, W., DesRoches, R., and Eguchi,
R. T., 2011. Crowdsourcing for Rapid Damage Assessment: The Global Earth Observation Catastro-
phe Assessment Network (GEO-CAN). Earthquake Spectra 27, S179–S198.
Goda, K. and Hong, H. P., 2008. Spatial correlation of peak ground motions and response spectra.
Bulletin of the Seismological Society of America 98, 354–365. doi:10.1785/0120070078.
Grujic, O., 2017. Subsurface Modeling with Functional Data. Ph.D. thesis, Stanford University.
unthal, G., 1998. European Macroseismic Scale 1998, vol. 15. ISBN 2879770084, 100 pp.
Gunasekera, R., Daniell, J., Pomonis, A., Arias, R. A., Ishizawa, O., and Stone, H., 2018. Methodology
Note on the Global RApid post-disaster Damage Estimation (GRADE) approach.Tech. rep., Global
Facility for Disaster Reduction and Recovery, Washington, DC.
Hengl, T., Heuvelink, G., and Stein, A., 2003. Comparison of kriging with external drift and regression-
kriging. Technical note, ITC p. 17. doi:10.1016/S0016-7061(00)00042-2.
Hengl, T., Heuvelink, G. B. M., and Stein, A., 2004. A generic framework for spatial prediction of soil
variables based on regression-kriging. Geoderma 120, 75–93. doi:10.1016/j.geoderma.2003.08.018.
Hunt, A. and Specht, D., 2019. Crowdsourced mapping in crisis zones: collaboration, organisation and
impact. Journal of International Humanitarian Action 4, 1–11. doi:10.1186/s41018-018-0048-1.
Huyck, C. K., 2015. Gorkha (Nepal) Earthquake Response.
Jaiswal, K., Wald, D., and Hearne, M., 2009. Estimating casualties for large earthquakes worldwide
using an empirical approach: US geological survey open-file report, OF 2009-1136, 78 p.Tech. rep.
Jaiswal, K. and Wald, D. J., 2011. Rapid Estimation of the Economic Consequences of Global Earth-
quakes.Tech. rep., USGS, Reston, VA.
James, G., Witten, D., Hastie, T., and Tibshirani, R. J., 2013. An Introduction to Statistical Learning.
Springer, New York, NY. ISBN 9781461471370, 1–440 pp.
Jarvis, A., Reuter, H. I., Nelson, A., and Guevara, E., 2008. Hole-filled seamless SRTM data V4.
Jayaram, N. and Baker, J., 2009. Correlation model for spatially distributed ground-motion intensities.
Earthquake Engineering {&}Structural Dynamics {...}.
JICA, 2002. The study on earthquake disaster mitigation in the Kathmandu Valley, Kingdom of Nepal.
Tech. rep., Japan International Cooperation Agency : Nippon Koei Co., Ltd. : Oyo Corp.
Kerle, N., 2013. Remote Sensing Based Post-Disaster Damage Mapping with Collaborative Methods.
Intelligent Systems for Crisis Management pp. 121–133. doi:10.1007/978-3-642-33218-0.
Kerle, N. and Hoffman, R. R., 2013. Collaborative damage mapping for emergency response : the role
of Cognitive Systems Engineering. Natural hazards and earth system sciences 13, 97–113.
Lallemant, D. and Kiremidjian, A., 2013. Rapid post-earthquake damage estimation using remote-
sensing and field-based damage data integration. In Safety, Reliability, Risk and Life-Cycle Perfor-
mance of Structures and Infrastructures, pp. 3399–3406. CRC Press.
Lallemant, D., Soden, R., Rubinyi, S., Loos, S., Barns, K., and Bhattacharjee, G., 2017. Post-
Disaster Damage Assessments as Catalysts for Recovery: A Look at Assessments Conducted in
the Wake of the 2015 Gorkha, Nepal, Earthquake. Earthquake Spectra 33, S435–S451. doi:
Loos, S., Barns, K., Bhattacharjee, G., Soden, R., Herfort, B., Eckle, M., Giovando, C., Girardot, B.,
Saito, K., Deierlein, G., Kiremidjian, A., Baker, J. W., and Lallemant, D., 2018. The Development and
Uses of Crowdsourced Building Damage Information based on Remote-Sensing.Tech. rep., Stanford,
McBratney, A. B., Odeh, I. O., Bishop, T. F., Dunbar, M. S., and Shatar, T. M., 2000. An overview of
pedometric techniques for use in soil survey, vol. 97. ISBN 0016-7061, 293–327 pp. doi:10.1016/
Monfort, D., Negulescu, C., and Belvaux, M., 2019. Remote sensing vs. field survey data in a post-
earthquake context: Potentialities and limits of damaged building assessment datasets. Remote Sens-
ing Applications: Society and Environment 14, 46–59. doi:10.1016/j.rsase.2019.02.003.
Motaghian, H. R. and Mohammadi, J., 2011. Spatial estimation of saturated hydraulic conductivity from
terrain attributes using regression, kriging, and artificial neural networks. Pedosphere 21, 170–177.
Nepal Earthquake Housing Reconstruction Multi-Donor Trust Fund, 2016. Nepal Earthquake Housing
Reconstruction Annual Report.Tech. rep., Nepal Earthquake Housing Reconstruction Multi-Donor
Trust Fund, Kathmandu, Nepal.
Odeh, I. O. A., McBratney, A. B., and Chittleborough, D. J., 1994. Spatial prediction of soil properties
from landform attributes derived from a digital elevation model. Geoderma 63, 197–214. doi:10.
Oliver, M. A. and Webster, R., 2014. A tutorial guide to geostatistics: Computing and modelling vari-
ograms and kriging. Catena 113, 56–69. doi:10.1016/j.catena.2013.09.006.
REACH, 2014. Groundtruthing Open Street Map Building Damage Assessment: Haiyan Typhoon - The
Philippines.Tech. Rep. April, REACH; American Red Cross; USAID.
Shelter Cluster Nepal, 2015. Shelter and Settlements Vulnerability Assessment: Nepal 25 April / 12 May
Earthquakes Response Nepal.Tech. Rep. June, Shelter Cluster Nepal, Nepal.
Shome, N., Jayaram, N., and Rahnama, 2012. Uncertainty and Spatial Correlation Models for Earth-
quake Losses. In 15th World Conference on Earthquake Engineering (15WCEE), p. 10. Lisbon,
Thompson, E. M., Baise, L. G., Kayen, R. E., Tanaka, Y., and Tanaka, H., 2010. A geostatistical
approach to mapping site response spectral amplifications. Engineering Geology 114, 330–342.
Trendafiloski, G., Wyss, M., and Rosset, P., 2009. Loss Estimation Module in the Second Genera-
tion Software QLARM. In Second International Workshop on Disaster Casualties, June, pp. 1–10.
Cambridge, UK. ISBN 9789048194551. doi:10.1007/978-90-481-9455-1.
United Nations Office for the Coordination of Humanitarian Affairs, 2019. ReliefWeb - Informing
humanitarians worldwide.
e catholique de Louvain (UCL) - CRED and Guha-Sapir, D., . EM-DAT: The Emergency
Events Database.
Wald, D. J., Jaiswal, K. S., Marano, K. D., Garcia, D., So, E., and Hearne, M., 2012. Impact-Based
Earthquake Alerts with the U. S. Geological Surveys PAGER System: What’s Next? In 15th World
Conference on Earthquake Engineering, Lisbon Portugal.
Westrope, C., Banick, R., and Levine, M., 2014. Groundtruthing OpenStreetMap Building Damage
Assessment. Procedia Engineering 78, 29–39.
Worden, C. B., Thompson, E. M., Baker, J. W., Bradley, B. A., Luco, N., and Wald, D. J., 2018. Spatial
and Spectral Interpolation of GroundMotion Intensity Measure Observations. Bulletin of the Seismo-
logical Society of America doi:10.1785/0120170201.
Worden, C. B. and Wald, D., 2016. ShakeMap Manual.Tech. rep.
Yun, S.-h., Hudnut, K., Owen, S., Webb, F., Sacco, P., Gurrola, E., Manipon, G., Liang, C., Fielding, E.,
Milillo, P., Hua, H., and Coletta, A., 2015. Rapid Damage Mapping for the 2015 M w 7 . 8 Gorkha
Earthquake Using Synthetic Aperture Radar Data from COSMO SkyMed and ALOS-2 Satellites.
Seismological Research Letters 86, 1549–1556. doi:10.1785/0220150152.
... Majed). International Journal of Disaster Risk Reduction 98 (2023) 104089 2 M. Rahmani-Qeranqayeh et al. and are more precise than predictions obtained from curves [4,18]. The current study focused on predictions using field-based samples. ...
... Field-based data must be well-dispersed across the entire study area to provide reliable and unbiased results [19][20][21][22][23], but this is often overlooked in studies. In addition, earthquake damage data are spatially correlated [4,18,[24][25][26][27][28][29][30], which is a function of the distance between the locations of the data points [6,31,32]. This means that nearby samples will be more similar than distant samples in terms of factors such as construction age and material and local soil conditions. ...
... Spatially correlated regression models can model unknown, unmeasurable or unobservable effective factors through the use of spatially correlated residual terms [4,18,40,41]. The KR model was proposed by Ahmed and de Marsily [42] to produce maps of the distribution of transmissivity in an aquifer. ...
... Reducing the loss of life and economic and social impact in the aftermath of an earthquake goes hand in hand with reducing the vulnerability of buildings to the hazard (Loos et al., 2020). Seismically vulnerable buildings include non-engineered construction and buildings with design defi ciencies such as soft-story and vertical stiff ness irregularities. ...
... Seismic assessment methods that can capture such defi ciencies range from empirical assessment approaches such as on-site screenings and scoring methods to more accurate and detailed methods such as non-linear time history analysis (Kassem et al., 2020). But the high quantity of buildings in a given city can make it challenging to allocate suffi cient funds for retrofi t programs in buildings where it is more necessary (Loos et al., 2020). Cities with large building inventories will appreciate the potential benefi ts of automation and have access to an automated ranking system that will expose those buildings in their inventory that need to be carefully studied and, if necessary, retrofi tted before a major seismic event tests the built environment. ...
With the overwhelming number of older reinforced concrete buildings that need to be assessed for seismic vulnerability in a city, local governments face the question of how to assess their building inventory. By leveraging engineering drawings that are stored in a digital format, a well-established method for classification reinforced concrete buildings with respect to seismic vulnerability, and machine learning techniques, we have developed a technique to automatically extract quantitative information from the drawings to classify vulnerability. Using this technique, stakeholders will be able to rapidly classify buildings according to their seismic vulnerability and have access to information they need to prioritize a large building inventory. The approach has the potential to have significant impact on our ability to rapidly make decisions related to retrofit and improvements in our communities. In the Los Angeles County alone it is estimated that several thousand buildings of this type exist. The Hassan index is adopted here as the method for automation due to its simple application during the classification of the vulnerable reinforced concrete buildings. This paper will present the technique used for automating information extraction to compute the Hassan index for a large building inventory.
... Stojadinovic et al. [12] have extended the use of RFs and formulated an operational methodology for rapid regional repair cost estimation that requires careful prioritization of inspections. Using inspection data from the 2015 Nepal earthquake, Loos et al. [13] employed geo-statistical techniques to predict the mean damage state aggregated to equally spaced grid cells. However, the proposed method only applies to continuous data, requiring aggregation and averaging of categorical inspection results over grid cells, potentially leading to information loss and constraining the inspection process, because all buildings in a grid cell have to be inspected. ...
... GP regression is also widely applied in the geo-statistics field, where it is known as kriging [23] and focuses on two-dimensional input spaces. The method for post-earthquake damage mapping proposed by Loos et al. [13], first uses the non-spatial inputs in least-squares regression to remove a linear trend in the data and secondly uses the geo-coordinates to update GP models with the remaining residuals. The method proposed by Sheibani and Ou [14] also relies on GP regression but applies it to the entire input space and not only on the geocoordinates. ...
Full-text available
The widespread earthquake damage to the built environment induces severe short- and long-term societal consequences. Better community resilience may be achieved through well-organized recovery. Decisions to organize the recovery process are taken under intense time pressure using limited, and potentially inaccurate, data on the severity and the spatial distribution of building damage. We propose to use Gaussian Process inference models to fuse the available inspection data with a pre-existing earthquake risk model to dynamically update regional post-earthquake damage estimates and thereby support a well-organized recovery. The proposed method consistently aggregates the gradually incoming building damage inspection data to reduce the uncertainty in ground shaking intensity geographic distribution and to update regional building damage estimates. The performance of the proposed Gaussian Process methodology is demonstrated on one fictitious earthquake scenario and two real earthquake damage datasets. A comparison with purely data-driven methods shows that the proposed method reduces the number of building inspections required to provide reliable and precise damage predictions.
... However, the unequal distribution of fatalities across castes and income groups observed in Fig. 1 reflects both uneven exposure and the increased vulnerability of building types occupied by these groups. For instance, there is a strong negative correlation between income and building vulnerability, with lower-income households tending to occupy unreinforced masonry buildings highly vulnerable to earthquakes 54 . ...
Full-text available
Societal efforts to understand and mitigate threats posed by hazards are often informed by complex disaster risk models. Despite research demonstrating the disproportionate effects of disasters on vulnerable groups, current risk modeling approaches lack robust methods to account for such equity concerns. Consequently, efforts to develop evidence-based disaster risk management interventions may lack awareness of differential risks in the settings where they are applied. Here, we draw on the relevant literature to develop a typology for characterizing current approaches to incorporating equity into risk modeling. Using this typology, we then evaluated 69 risk assessments conducted by major international development organizations. We found that only ~ 28% of risk models attempt a quantitative evaluation of the differential impacts of disasters and climate change. We then used an equity-sensitive approach to reconstruct a recent risk assessment and show that important elements are missed when equity is excluded in disaster risk modeling.
... These approaches can integrate with existing models and could significantly improve the accuracy and spatial resolution of our shaking and loss estimates. Our efforts build on earlier geospatial strategies that use satellite imagery to improve post-earthquake damage quantification needed for Post-Disaster Needs Assessments (PDNA; Loos et al., 2020). I do not address the rapidly expanding realm of Earthquake Early Warning (EEW), both in the United States and worldwide, because those systems are not in my purview. ...
Full-text available
The primary ingredients on the hazard side of the equation include the rapid characterization of the earthquake source and quantifying the spatial distribution of the shaking, plus any secondary hazards an earthquake may have triggered. On the earthquake impact side, loss calculations require the aforementioned hazard assessments—and their uncertainties—as input, plus the quantification of the exposure and vulnerability of structures, infrastructure, and the affected inhabitants. Lastly, effectively communicating uncertain estimates of the resulting impacts on society requires careful consideration of its function and form. All these aspects of rapid earthquake information delivery entailed wide‐ranging collaborative research and development among seismologists, earthquake engineers, geographers, social scientists, Information Technology professionals, and communication experts, leveraging diverse components and ingredients not achievable without extensive collaboration. I was very fortunate to be able to work on interesting and useful projects with many colleagues who got involved with them. Advances in content, its rapid delivery, and our ability to better communicate uncertain loss estimates greatly expanded the range of users and critical decision‐makers who could directly benefit from rapid post‐earthquake information. Moreover, in the critical user–developer feedback loop, we have intently followed requests from users to develop new ways of delivering the most‐requested post‐earthquake information within the limitations of the science and technology. Such new avenues and tools then motivated and prioritized additional research directions and developments.
... Importantly, the conceptual framework in this study could provide a basis for formulating and testing hypotheses in empirical research on the development of more equitable resource access policy and infrastructure resilience investment in vulnerable communities. The conceptual framework lays the required foundation to further develop, validate, train, and calibrate complementary computer models, machine learning and artificial intelligence (AI) systems for managing complex disaster relief missions throughout the disaster lifecycle [e.g., 17,24,28,32,[36][37][38][39][40]. Modeling could augment empirical research and observational studies by providing initial estimates of demand for emergency food and water before any relief is delivered, when resources must be allocated with little or no information [16,39,41,42]. ...
After a disaster event such as an earthquake or a hurricane, performing comprehensive and detailed damage assessment of lifeline infrastructure is critical to effective disaster response. In recent years, there has been a rapid increase in the implementation of varying tools for this purpose. These tools and resulting datasets include satellites, drone imagery, LIDAR scans, water level sensors, structural strain gages, etc. Each of these tools differs in terms of purpose, the precision of the data collected, and the resources required for data collection and processing. To this point, these technologies have been deployed in the field in an ad hoc and often uncoordinated manner. Coordinating data collection efforts has the opportunity to provide more detailed, accurate, and comprehensive lifeline damage assessment through augmenting datasets, validating information, and filling in information gaps. However, this requires a comprehensive understanding of available tools and their specific characteristics. This paper fills this gap by providing a critical and comprehensive review of the tools available for post-disaster damage assessment. This work focuses on the tools used to assess physical damage in lifeline networks and buildings. Included are tools across lifeline networks, including water, gas, transportation, power, and building infrastructure, as well as across hazard types, including earthquakes and inundations resulting from hurricanes or extensive rain. Each tool is presented and critically analyzed along key dimensions including coverage, precision, and availability over time to provide insights into integrating datasets across tools and identify gaps in existing data collection approaches. The results form the basis for recommendations for improving post-disaster damage assessment, including coordinated data collection, leveraging geographical interdependencies to assess buried infrastructure, and the inclusion of the detailed characteristics of the tool in the metadata. This work is the first to provide a systematic and comprehensive analysis of the tools for building and lifeline infrastructure damage assessment and provides a basis for future integration of datasets and development of post-disaster data collection tools.
Rapid and efficient infrastructure restoration is critical to reducing the impacts of extreme events on community lifelines. Following a large-scale extreme event, infrastructure restoration at various stages is carried out simultaneously by agencies at various government levels and jurisdictions. Since each agency has different roles, responsibilities, and boundaries within which it operates, coordination and communication among them are challenging. With the overall goal of providing a common operating picture and facilitating concerted planning and action among emergency response agencies, this research proposes a data-driven and equity-centered framework that links the various stages—damage identification, restoration scheduling, and monitoring and control—of infrastructure restoration. This study takes a particular focus on the highway restoration caused by flood inundation. In detail, the framework is composed of three parts, including (1) a systematic data-driven approach that quickly provides spatially distributed estimates of highway inundation, (2) an equity-centered restoration scheduling strategy that prioritizes restoration tasks based on community social vulnerability, and (3) a Bayesian-based approach that provides an up-to-date indication of the impacts of component level changes on the overall restoration progress. A case study on highway inundation in Harris County during Hurricane Harvey was conducted to demonstrate the feasibility and applicability of the proposed framework. In the case study, multisource data, including physical highway topology, geospatial information, field inspection results, and socioeconomic and demographic data, were used. Our framework generates outputs that can be used for rapid damage identification, automated restoration scheduling, and real-time progress updating. In practice, these outputs facilitate quick and shared situational awareness among the involved agencies, which is expected to ease communication and coordination and help overcome challenges resulting from parallel and fragmented restoration efforts. To the authors’ best knowledge, this is the first framework that aims to support the management of infrastructure restoration by synthesizing various restoration stages.
Full-text available
Abstract Crowdsourced mapping has become an integral part of humanitarian response, with high profile deployments of platforms following the Haiti and Nepal earthquakes, and the multiple projects initiated during the Ebola outbreak in North West Africa in 2014, being prominent examples. There have also been hundreds of deployments of crowdsourced mapping projects across the globe that did not have a high profile. This paper, through an analysis of 51 mapping deployments between 2010 and 2016, complimented with expert interviews, seeks to explore the organisational structures that create the conditions for effective mapping actions, and the relationship between the commissioning body, often a non-governmental organisation (NGO) and the volunteers who regularly make up the team charged with producing the map. The research suggests that there are three distinct areas that need to be improved in order to provide appropriate assistance through mapping in humanitarian crisis: regionalise, prepare and research. The paper concludes, based on the case studies, how each of these areas can be handled more effectively, concluding that failure to implement one area sufficiently can lead to overall project failure.
Full-text available
In the wake of large earthquake disasters, governments, international agencies, and large nongovernmental organizations scramble to conduct impact and damage assessments that help them understand the nature and scale of the emergency in order to orchestrate a complex series of emergency, response, and recovery activities. Using the Gorkha earthquake as a case study, this research seeks to provide greater clarity into the types of post-disaster damage assessments, their purposes, and their potential as catalysts for critical recovery activities. We argue that damage assessment methodologies need to be tailored to the diverse information needs in post-disaster contexts, which vary by user group and change over time. This research builds upon the authors' direct experience supporting the government of Nepal in the Post-Disaster Needs Assessment (PDNA) process, support with the rapid visual inspections conducted by the National Engineering Association, and interviews with humanitarian organizations who conducted damage assessment in Nepal.
Full-text available
The 25 April 2015 Mw 7.8 Gorkha earthquake caused more than 8000 fatalities and widespread building damage in central Nepal. The Italian Space Agency's COSMO-SkyMed Synthetic Aperture Radar (SAR) satellite acquired data over Kathmandu area four days after the earthquake and the Japan Aerospace Exploration Agency's Advanced Land Observing Satellite-2 SAR satellite for larger area nine days after the mainshock. We used these radar observations and rapidly produced damage proxy maps (DPMs) derived from temporal changes in Interferometric SAR coherence. Our DPMs were qualitatively validated through comparison with independent damage analyses by the National Geospatial-Intelligence Agency and the United Nations Institute for Training and Research's United Nations Operational Satellite Applications Programme, and based on our own visual inspection of DigitalGlobe'sWorld-View optical pre-versus postevent imagery. Our maps were quickly released to responding agencies and the public, and used for damage assessment, determining inspection/imaging priorities, and reconnaissance fieldwork.
Quick building damage assessment following disasters such as large earthquakes serves to establish a preliminary estimation of losses and casualties. These datasets are completed by employing several crowdsourcing initiatives, in which volunteers and collaborators map damaged buildings in a given area at a qualitative damage scale based on a post-earthquake aerial or satellite image. Automating this process is a temptation and a technical issue, but manual interpretation remains essential, with the identification of moderate and lateral damage being the key and limiting factor. Following the Haiti 2010 earthquake, many studies were completed by crossing multilayer data gathered from different sources (satellite, aerial, and field survey). These works created a building damage dataset that enabled the construction of different sets of empirical vulnerability functions. In the present study, we proposed to review the datasets used for the damage assessment again, investigate how they can be managed for understanding urban damage patterns, and quantify the potentialities and limits of the sets. A high-resolution map of damage in Port-au-Prince was used to obtain a deducted map of intensity and was then compared to more detailed post-earthquake investigations such as the microzonation of the city (Belvaux et al., 2018). These detailed post-earthquake investigations, in which array microtremor measurements are performed for characterization of the subsurface soil, contribute to a better understanding of local variations in intensity. Subsequently, a retro damage scenario was run, considering the different sets of vulnerability functions (using the RISK-UE methodology vulnerability indexes) fitted with empirical vulnerability functions. Using the characterization of the exposure on a remote sensing basis, the results fit the heaviest damage well (building collapse), but they overestimated moderate damage states compared to the observations. However, is an aerial image based dataset sufficiently exhaustive for moderate damage, which is mostly visible from a lateral or internal point of view? Finally, we suggested some range of adjustments that can be applied to a vulnerability assessment originating from remote sensing data such that it can be used more accurately in the detection of urban damage, even for moderate damage degrees.
Following a significant earthquake, ground-motion observations are available for a limited set of locations and intensity measures (IMs). Typically, however, it is desirable to know the ground motions for additional IMs and at locations where observations are unavailable. Various interpolation methods are available, but because IMs or their logarithms are normally distributed, spatially correlated, and correlated with each other at a given location, it is possible to apply the conditional multivariate normal (MVN) distribution to the problem of estimating unobserved IMs. In this article, we review the MVN and its application to general estimation problems, and then apply the MVN to the specific problem of ground-motion IM interpolation. In particular, we present (1) a formulation of the MVN for the simultaneous interpolation of IMs across space and IM type (most commonly, spectral response at different oscillator periods) and (2) the inclusion of uncertain observation data in the MVN formulation. These techniques, in combination with modern empirical ground-motion models and correlation functions, provide a flexible framework for estimating a variety of IMs at arbitrary locations.
Assessment of human casualties in earthquakes has become a topic of vital importance for national and urban authorities responsible for emergency provision, for the development of mitigation strategies and for the development of adequate insurance schemes. In the last few years important work has been carried out on a number of recent events (including earthquakes in Kocaeli, Turkey 1999, Niigata Japan, 2004, Sichuan, China 2008 and L'Aquila,Italy 2009). These events have created new and detailed casualty data, which has not until now been properly assembled and evaluated. This book draws the new evidence from recent events together with existing knowledge. It summarises current trends in the understanding of the factors influencing the numbers and types of casualties in earthquakes; it offers methods to incorporate this understanding into the estimation of losses in future events in different parts of the world; it discusses ways in which pre-event mitigation activity and post-event emergency management can reduce the toll of casualties in future events; and it identifies future research needs. Audience: This book will be of interest to scientists and professionals in engineering, geography, emergency management, epidemiology and the insurance industry.
We studied the earthquake mortality rates for more than 4,500 worldwide earthquakes since 1973 and developed an empirical country- and region-specific earthquake vulnerability model to be used as a candidate for post-earthquake fatality estimation by the U.S. Geological Survey’s Prompt Assessment of Global Earthquakes for Response (PAGER) system. Earthquake fatality rate is defined as the ratio of the total number of shaking-related fatalities to the total population exposed at a given shaking intensity (in terms of Modified-Mercalli (MM) shaking intensity scale). An atlas of global Shakemaps developed for PAGER project (Allen and others, 2008) and the Landscan 2006 population database developed by Oak Ridge National Laboratory (Dobson and others, 2000; Bhaduri and others, 2002) provides global hazard and population exposure information which are necessary for the development of fatality rate. Earthquake fatality rate function is expressed in terms of a two-parameter lognormal cumulative distribution function. The objective function (norm) is defined in such a way that we minimize the residual error in hindcasting past earthquake fatalities. The earthquake fatality rate is based on past fatal earthquakes (earthquakes causing one or more deaths) in individual countries where at least four fatal earthquakes occurred during the catalog period. All earthquakes that have occurred since 1973 (fatal or non-fatal) were included in order to constrain the fatality rates for future estimations. Only a few dozen countries have experienced four or more fatal earthquakes since 1973; hence, we needed a procedure to derive regional fatality rates for countries that had not had enough fatal earthquakes during the catalog period. We propose a new global regionalization scheme based on idealization of countries that are expected to have similar susceptibility to future earthquake losses given the existing building stock, its vulnerability, and other socioeconomic characteristics. The fatality estimates obtained using an empirical country- or region-specific model will be used along with other selected engineering risk-based loss models (semi-empirical and analytical) in the U.S. Geological Survey’s Prompt Assessment of Global Earthquakes for Response (PAGER) system for generation of automated earthquake alerts. These alerts could potentially benefit the rapid earthquake response agencies and governments for better response to reduce earthquake fatalities. Fatality estimates are also useful to stimulate earthquake preparedness planning and disaster mitigation. The proposed model has several advantages as compared with other candidate methods, and the country- or region-specific fatality rates can be readily updated when new data become available.
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.