ArticlePDF Available

Abstract and Figures

Recently deep learning (DL), as a new data-driven technique compared to conventional approaches, has attracted increasing attention in geophysical community, resulting in many opportunities and challenges. DL was proven to have the potential to predict complex system states accurately and relieve the “curse of dimensionality” in large temporal and spatial geophysical applications. We address the basic concepts, state-of-the-art literature, and future trends by reviewing DL approaches in various geosciences scenarios. Exploration geophysics, earthquakes, and remote sensing are the main focuses. More applications, including Earth structure, water resources, atmospheric science, and space science, are also reviewed. Additionally, the difficulties of applying DL in the geophysical community are discussed. The trends of DL in geophysics in recent years are analyzed. Several promising directions are provided for future research involving DL in geophysics, such as unsupervised learning, transfer learning, multimodal DL, federated learning, uncertainty estimation, and active learning. A coding tutorial and a summary of tips for rapidly exploring DL are presented for beginners and interested readers of geophysics.
This content is subject to copyright. Terms and conditions apply.
1. Introduction
Geophysics is a discipline that uses physical principles and methods to investigate and characterize the
Earth, from the Earth's core to the Earth's surface. Modern geophysics extends to outer space, from the outer
layers of the Earth's atmosphere to other planets. The general methods of geophysics consist of data obser-
vation, processing, modeling, and prediction. Observation is an essential means by which humans come
to understand unknown geophysical phenomena. Data observation uses mainly noninvasive techniques
such as seismic waves, gravity fields, and remote sensing. Data processing techniques, including denoising
and reconstruction, retrieve useful information from raw observations. Mathematical modeling based on
physical laws helps to characterize geophysical phenomena. Predictions provide the unknown based on the
known data and models. Spatial predictions are used to uncover the Earth's interior, such as in exploration
geophysics, which images the physical properties of the subsurface. Temporal predictions provide the his-
torical or future states of the Earth, such as in weather forecasting.
With the advance of acquisition equipment, the amount of geophysical observed data is increasing at an
impressive speed. How to utilize such a big amount of data for processing, modeling and prediction is a
significant problem. It could be helpful to solve part of the bottlenecks in traditional geophysical methods.
Taking modeling as an example, one of the most challenging tasks in modeling is to characterize the Earth
with a high resolution. However, there is a contradiction in traditional methods that prevents the simulta-
neous achievement of both a high resolution and a wide range of data observation due to hardware limi-
tations. Therefore, it is nearly impossible to obtain a high resolution model of the Earth, either spatially or
temporally, since the Earth has an extremely large spatial and temporal scale. An Earth system numerical
simulation facility in China, called EarthLab (Li, Bao, etal.,2019), can at most provide a resolution of 25km
Abstract Recently deep learning (DL), as a new data-driven technique compared to conventional
approaches, has attracted increasing attention in geophysical community, resulting in many opportunities
and challenges. DL was proven to have the potential to predict complex system states accurately and
relieve the “curse of dimensionality” in large temporal and spatial geophysical applications. We address
the basic concepts, state-of-the-art literature, and future trends by reviewing DL approaches in various
geosciences scenarios. Exploration geophysics, earthquakes, and remote sensing are the main focuses.
More applications, including Earth structure, water resources, atmospheric science, and space science, are
also reviewed. Additionally, the difficulties of applying DL in the geophysical community are discussed.
The trends of DL in geophysics in recent years are analyzed. Several promising directions are provided for
future research involving DL in geophysics, such as unsupervised learning, transfer learning, multimodal
DL, federated learning, uncertainty estimation, and active learning. A coding tutorial and a summary of
tips for rapidly exploring DL are presented for beginners and interested readers of geophysics.
Plain Language Summary With the rapid development of artificial intelligence (AI),
students and researchers in the geophysical community would like to know what AI can bring to
geophysical discoveries. We present a review of deep learning (DL), a popular AI technique, for
geophysical readers to understand recent advances, open problems, and future trends. This review aims
to pave the way for more geophysical researchers, students, and teachers to understand and use DL
techniques.
YU AND MA
© 2021. The Authors.
This is an open access article under
the terms of the Creative Commons
Attribution License, which permits use,
distribution and reproduction in any
medium, provided the original work is
properly cited.
Deep Learning for Geophysics: Current and Future
Trends
Siwei Yu1 and Jianwei Ma2
1School of Mathematics, Institute of Artificial Intelligence, Harbin Institute of Technology, Harbin, China, 2School of
Earth and Space Sciences, Center of Artificial Intelligence Geosciences, Peking University, Beijing, China
Key Points:
The concept of deep learning (DL)
and classical architectures of deep
neural networks are introduced
A review of state-of-the-art DL
methods in geophysical applications
is provided
The future directions for developing
new DL methods in geophysics are
discussed
Correspondence to:
J. Ma,
jwm@pku.edu.cn
Citation:
Yu, S., & Ma, J. (2021). Deep learning
for geophysics: Current and future
trends. Reviews of Geophysics,
59, e2021RG000742. https://doi.
org/10.1029/2021RG000742
Received 11 MAR 2021
Accepted 25 MAY 2021
10.1029/2021RG000742
REVIEW ARTICLE
1 of 36
Reviews of Geophysics
YU AND MA
10.1029/2021RG000742
2 of 36
Table 1
Examples of Data-Driven Tasks in Geophysics
Reviews of Geophysics
for the atmosphere and 10km for oceans based on a high-performance computation device with 15 P FLOPs
(floating-point operations per second). Several specific difficult tasks in geophysics are listed in Table1.
To illustrate the bottlenecks in processing and prediction, we use exploration geophysics as an example. Ex-
ploration geophysics aims to observe Earth's subsurface or other planets
with data collected at the surface, such as seismic fields and gravity fields.
The main process of exploration geophysics includes pre-processing and
imaging, where imaging means predicting the subsurface structures. In
the geophysical signal pre-processing stage, the simplest assumption re-
garding the shapes of underground layers is that the reflective seismic
records are linear in small windows (Spitz,1991). The sparsity assump-
tion presumes that the data are sparse under certain transforms (Donoho
& Johnstone,1995), such as the curvelet domain (Herrmann & Henn-
enfent, 2008) or other time-frequency domains (Mousavi, Langston,
etal.,2016; Mousavi & Langston2016,2017). The low-rank assumption
supposes that the data are low-rank after the Hankel transform (Oropeza
& Sacchi,2011). However, the predesigned linear assumption or sparse
transform assumption is not adaptive to different types of seismic data
and may lead to low denoising or interpolation quality for data with com-
plex structures. In the geophysical imaging stage, wave equations are fun-
damental tools to govern the kinematics and dynamics of seismic wave
propagation. Acoustic, elastic, or viscoelastic wave equations introduce
an increasing number of factors into the wave equations, and the gener-
ated wavefield records can precisely estimate real scenarios. However, as
YU AND MA
10.1029/2021RG000742
3 of 36
Figure 1. An illustration of model-driven and data-driven methods. On the left are the research topics in geophysics ranging from the Earth's core to the outer
space. On the right is the observation means used at present. In the middle are examples of model-driven and data-driven methods. In model-driven methods,
the principles of geophysical phenomena are induced from a large amount of observed data based on physical causality, then the models are used to deduct the
geophysical phenomena in the future or in the past. In data-driven methods, the computer first inducts a regression or classification model without considering
physical causality. Then, this model will perform tasks such as classification on incoming datasets.
Figure 2. The containment relationship among artificial intelligence,
machine learning, neural network and deep learning, and the classification
of deep learning approaches.
Reviews of Geophysics
the wave equation becomes increasingly complex, the numerical implementation of the equation becomes
nontrivial, and the computational cost increases considerably for large-scale scenarios.
Different from traditional model-driven methods, machine learning (ML) is a type of data-driven approach
that trains a regression or classification model through a complex nonlinear mapping with adjustable pa-
rameters based on a training data set. The comparison of model-driven and data-driven approaches is sum-
marized in Figure1. For decades, ML methods have been widely adopted in various geophysical applica-
tions, such as exploration geophysics (Huang etal.,2006; Helmy etal.,2010; Jia & Ma, 2017; Lim, 2005;
Poulton,2002; Zhang etal.,2014), earthquake localization (Mousavi, Horton, etal.,2016), aftershock pat-
tern analysis (DeVries etal., 2018), and Earth system analysis (Reichstein etal.,2019). A review article
about ML in solid Earth geoscience was recently published in Science (Bergen etal.,2019). The topic in-
YU AND MA
10.1029/2021RG000742
4 of 36
Figure 3. (a) and (b) are statics of artificial intelligence (AI)-related papers in SEG Library and AGU Library. In (a), Geophysics means the flagship journal
of SEG. SEG Expanded Abstracts means the Expanded Abstracts from SEG annual meeting. SEG Library papers mean the papers founded in the SEG digital
library. In (b), the first three captions in the legend are the names of top journals in AGU. The fourth caption in the legend represents the papers founded in the
AGU digital library.
Figure 4. The topics included in this review. (a) Deep learning (DL)-based geophysical applications. (b) The future trends of applying DL in geophysics.
Reviews of Geophysics
cludes a variety of ML techniques, from traditional methods, such as logistic regression, support vector
machines, random forests and neural networks, to modern methods, such as deep neural network and deep
generative models. The article stresses that ML will play a key role in accelerating the understanding of the
complex, interacting and multiscale processes of Earth's behavior.
In the ML community, an artificial neural network (ANN) is one such regression or classification model
that is analogous to the human brain and consists of layers of neurons. An ANN with more than one layer,
that is, a deep neural network (DNN), is the core of a recently developed ML method, named deep learning
(DL) (LeCun etal.,2015). DL mainly encompasses supervised and unsupervised approaches depending on
whether labels are available or not, respectively. Supervised approaches train a DNN by matching the input
and labels and are usually used for classification and regression tasks. Unsupervised approaches update
the parameters by building a compact internal representation and then are used for clustering or pattern
recognition. In addition, DL also contains semi-supervised learning where partial labels are available and
reinforcement learning where a human-designed environment provides feedback for the DNN. Figure2
summarizes the relationship from artificial intelligence to DL and the classification of DL approaches. DL
has shown potential in overcoming the limitations of traditional approaches in various areas. The perfor-
mance of DL is even superior to the performance of the human brain in specific tasks, such as image clas-
sification (5.1% vs. 3.57% with respect to the top-5 classification errors, He etal.,2016) and the game of Go.
The geophysical community has shown great interests in DL in recent years. Figure3 shows the published
papers related to artificial intelligence in two major geophysical unions, that is, society of exploration geo-
physics (SEG) and American geophysical union (AGU). A clear exponential growth is observed in both
libraries due to the use of DL techniques. Moreover, DL has also provided several astonishing results to
the geophysical community. For instance, on the STanford EArthquake Data set (STEAD), the earthquake
detection accuracy is improved to 100% compared to 91% accuracy of the traditional STA/LTA (short time
average over long time average) method (Mousavi, Zhu, Sheng, etal.,2019, Mousavi etal.,2020). DL makes
characterizing the earth with high resolution on a large scale possible (Chattopadhyay etal., 2020; Chen
etal.,2019; Zhang, Stanev, & Grayek,2020). DL can even be used for discovering physical concepts (Iten
etal.,2020), such as the solar system is heliocentric.
Our review introduces DL-related literature covering a variety of geophysical applications, from deep to the
Earth's core to distant outer space, and mainly focuses on exploration geophysics, earthquake science and a
geophysical data observation method for remote sensing. This review intends to first provide a glance at the
most recent DL research related to geophysics, along with analysis of the changes and challenges DL brings
to the geophysical community, and then discusses the future trends. Figure4 presents the topics included
in this review. In addition, we provide a cookbook for beginners who are interested in DL, from geophysical
students to researchers.
The first section above mentioned briefly introduces the background of geophysics and DL. Following con-
tents consist of three sections. The second section contains concepts, and we introduce the basic idea of
DL (Section2). The third section reviews DL applications in geophysical areas (Section3). A discussion of
future trends (Section4) is given as extensions of this review. The fifth section (Section5) summarizes this
review. A tutorial section for beginners is given in the appendix.
2. The Theory of Deep Learning
Readers who are already familiar with general theory in DL may skip to Section3. We denote scalars by
italic letters, vectors by bold lowercase letters and matrices by bold uppercase letters. In geophysics, a large
number of regression or classification tasks can be reduced to,
y Lx
(1)
where x stands for unknown parameters, y stands for observation which we partially know, and L is a for-
ward or degraded operator in geophysical data observation, such as noise contamination, subsampling, or
physical response. However, L is usually ill-conditioned or not invertible, or even not known. The inverse
of L is mainly approximately achieved by two routines: physical model-driven and data-driven. In physical
YU AND MA
10.1029/2021RG000742
5 of 36
Reviews of Geophysics
model-driven routines, an optimization objective loss function is established with an additional constraint,
such as sparsity constraint in dictionary learning. In data-driven routines, given an extensive training set, a
mapping between x and y is established by training, as done in DL, which is especially suitable for situations
where L is not precisely known.
To bring the reader into DL gradually, this paper first introduces another approach, that is, dictionary learn-
ing (Aharon etal., 2006), since the theoretical frameworks of dictionary learning and DL are similar. In
dictionary learning, an adaptive dictionary is learned as a representation of the target data. The key features
of dictionary learning are single-level decomposition, unsupervised learning, and linearity. Single-level de-
composition means that one dictionary is used to represent a signal. Unsupervised learning means no labels
are provided during dictionary learning. Besides, only the target data are used without an extensive training
set. Linearity implies that the data decomposition on the dictionary is linear. The above features make the
theory of dictionary learning simple. This review will help readers transfer existing knowledge on diction-
ary learning to DL.
2.1. Dictionary Learning
To solve Equation1, an optimization function E(x;y) with a regularization term R is constructed:

;,ED R
x y Lx y x
(2)
where D is a similarity measurement function. Typically, the L2-norm
Lx y
2
is used under the assump-
tion of Gaussian distribution for the error. Tikhonov regularization (
Rxx


2
2
) and sparsity are two pop-
ular regularization terms. In sparsity regularization,
R


1
, where W is a sparse transform with
several vectorized bases. W is also termed as the dictionary. The goal of dictionary learning is to train an
optimized sparse transform W, which is used for the sparse representation of x. The objective function of
dictionary learning involves learning W via matrix decomposition with constraints Rw and Rv on the dic-
tionary W and coefficient v,




,,
wv
E D RR
T
W v LW v y W v
(3)
where W and v are optimized alternatively, that is, dictionary updating and sparse coding. Here we intro-
duce two dictionary learning approaches: K-SVD and data-driven tight frame (DDTF).
YU AND MA
10.1029/2021RG000742
6 of 36
Figure 5. An illustration of dictionary learning: data-driven tight frame. The dictionary is initialized with a spline
framelet. After training based on a post-stack seismic data set, the trained dictionary exhibits apparent structures.
Reviews of Geophysics
K-SVD (where SVD is singular value decomposition) (Aharon etal.,2006) regularizes the sparsity of v and
normalizes the energy of W. K-SVD uses orthogonal matching pursuit for sparse coding and several tricks
in dictionary updating. First, one component of the dictionary is updated at a given time, and the remaining
terms are fixed. Second, a rank-1 approximation SVD algorithm is used to obtain the updated dictionary and
coefficients simultaneously, thereby accelerating convergence and reducing computational memory. K-SVD
is applied in geophysics with extensions to improve efficiency (Nazari Siahsar etal.,2017).
Despite the success of K-SVD in signal enhancement and compression, dictionary updating is still time-con-
suming regarding high-dimensional and large-scale datasets, such as 3D prestack data in seismic explora-
tion. K-SVD includes one SVD step to update one dictionary term. Can the entire dictionary be updated
by one SVD for efficient improvement? A data-driven tight frame (Cai etal.,2014; Liang etal.,2014) was
proposed by enforcing a tight frame constraint on the dictionary W. The tight frame condition is a slightly
weaker condition than orthogonality, for which the perfect reconstruction property holds. With the tight
frame property, dictionary updating in DDTF is achieved with one SVD, which is hundreds of times faster
than K-SVD. DDTF has been applied in high dimensional seismic data reconstruction (Yu etal.,2015,2016).
An example of a learned dictionary with 3D DDTF for a seismic volume is shown in Figure5.
2.2. Deep Learning
Unlike dictionary learning, DL treats geophysical problems as classification or regression problems. A DNN
F is used to approximate x from y,

;Fx
(4)
where
Θ
is the parameter set of the DNN. In classification tasks, x is a one-hot encoded vector repre-
senting the categories.
Θ
is obtained by building a high-dimension approximation between two sets
YU AND MA
10.1029/2021RG000742
7 of 36
Figure 6. The learned features in deep learning. (a) Training samples. (b) In each layer, nine of the learned filters are
shown. A great number of hierarchical structures are observed in different layers. Layer 1 exhibits edge structures, layer
2 shows small structures of seismic events, and layer 3 shows small portions of seismic sections. The filters in layer 2
and 3 are blank near edges, which may be caused by the boundary effect of the convolutional filter. Layer 4 gives larger
seismic portions, which are approximations to the training data. The filters in layer 4 look more similar to each other
than training datasets because deep neural network (DNN) tries to learn the similar and hierarchical patterns which
compose the data.
Reviews of Geophysics

,1
iiNXx
and

,1
iiNYy
, that is, the labels and inputs. The approximation is achieved by
minimizing the following loss function to obtain an optimized
Θ
:
EF
i
N
ii

;, ;XY x y


1
2
2

(5)
If F is differentiable, a gradient-based method can be used to optimize Θ. However, a large Jacobi matrix is
involved when calculating
E
Θ
, making it infeasible for large-scale datasets. A back-propagation method
(Rumelhart etal.,1986) is proposed to compute
E
Θ
and avoid computing the Jacobi matrix. In unsuper-
vised learning, the label x is not known, such that additional constraints are required, such as making x
identical to y.
The relations of DL and dictionary learning are as follows: the depth of decomposition, the amount of
training data, and the nonlinear operators. Dictionary learning is usually a single-level matrix decompo-
sition problem. A double sparsity (DS) dictionary learning was proposed to explore deep decomposition
(Rubinstein etal.,2010). The motivation of DS is that the learned dictionary atoms still share several un-
derlying sparse pattern for a generic dictionary. In other words, the dictionary is represented with a sparse
coefficient matrix multiplied by a fixed dictionary, as in discrete cosine transform. Inspired by DS dictionary
learning, can we propose triple, quadruple or even centuple dictionary learning? We know cascading linear
operators are equivalent to a single linear operator. Therefore, using more than one fixed dictionary does
not improve the signal representation ability compared to that ability of one fixed dictionary if no additional
constraints are provided. In DL, nonlinear operators are combined in such a deep structure. An ANN with
one hidden layer and nonlinear operators can represent any complex function with a sufficient number of
hidden neurons. To fit ANN with many hidden neurons, we need an extensive training set, while dictionary
learning involves only one target data. To compare the learned features of dictionary learning in Figure5,
the hierarchical structures of filters in DL are shown in Figure6.
YU AND MA
10.1029/2021RG000742
8 of 36
Figure 7. Understanding deep learning (DL) from different perspectives. Optimization: DL is basically a nonlinear
optimization problem which solves for the optimized parameters to minimize the loss function of the outputs and
labels. Dictionary learning: The filter training in DL is similar to that in dictionary learning. High dimensional
mapping: Deep neural network (DNN) in DL is basically a high-dimensional mapping from the input to the labels.
Optimal transport: a generative adversarial network can be interpreted by the theory of optimal transportation, which
involves transformation between the given white noise and the data distribution. Manifold learning: The representation
of training samples in the latent space of a DNN is similar to that learning a low dimensional manifold which contains
all the data samples. Ordinary differential equation: a recurrent neural networks is basically a solution of an ordinary
differential equation with the Euler method.
Reviews of Geophysics
The theory of DL can be understanded from different angles except for dictionary learning (Figure7). On
one hand, DL can be treated as an ultra-high dimensional nonlinear mapping from data space to the feature
space or the target space, where the nonlinear mapping is represented by a DNN. Therefore, DL is basically
a high-dimensional nonlinear optimization problem. On the other hand, recurrent neural networks (RNNs)
are basically a solution of the ordinary differential equation with the Euler method (Chen etal.,2018). A
generative adversarial network (Creswell etal., 2018; Goodfellow et al., 2014) (GAN) can also be inter-
preted by the theory of optimal transportation, since the targets of GAN are mainly manifold learning and
probability distribution transformation, that is, transformation between the given white noise and the data
YU AND MA
10.1029/2021RG000742
9 of 36
Figure 8. Sketches of deep neural networks (DNNs). The blue lines indicate inputs, and the orange lines indicate outputs. The length of the blue and orange
lines represents the data dimension. The green lines indicate intermedia connections. (a) In a fully connected neural network (FCNN), the inputs of one layer
are connected to every unit in the next layer. f stands for a nonlinear activation function. In (b–f ), we omit the details of the layers and maintain the shape of
each network architecture. (b) Vanilla convolutional neural network (CNN) is cascaded by convolutional layers, pooling layers, nonlinear layer, and etc. In
CNN, the outputs of the convolutional layers are either the same or smaller than the input depending on the strides used for convolution. Pooling layers will
reduce the size of the extracted features. In regression or classification tasks, the output usually has the same dimension or a smaller dimension than the input
(where (b) shows the latter situation). The difference between regression and classification is that the outputs are continuous variables in regression tasks and
discrete variables representing categories in classification tasks. The dimension of the latent feature space in the CAE may be either larger or smaller than
that of the data space, where (c) shows the latter. (d) Skip connections in U-Net are used to bring the low-level features to a high level. (e) In a GAN, low-
dimensional random vectors are used to generate a sample from the generator, and then the sample is classified as true or false by the discriminator. (f) In an
recurrent neural network (RNN), the output or hidden state of the network is used as input in a cycle.
Figure 9. Details in deep neural network (DNN) architectures. (a) Activation functions in the nonlinear layer.
ReLU is commonly used since its gradient is easily computed and can avoid gradient vanishing. (b) A typical block
in convolutional neural network (CNN). The convolutional layer and ReLU layer (nonlinear layer) are the basic
components of one CNN block. The batch normalization layer can avoid gradient explosion. The pooling layer can
extract features by subsampling the input.
Reviews of Geophysics
distribution (Lei etal.,2020). RNNs and GANs are two specific DNNs and will be introduced in the next
subsection.
2.3. Deep Neural Network Architectures
The key components of DL are the training set, network architectures and parameter optimization. The ar-
chitectures of DNNs vary in different applications; here, we introduce several commonly used architectures.
A fully connected neural network (FCNN) (Figure8a) is an ANN composed of fully connected layers where
the inputs of one layer are connected to every unit in the next layer. The weighted summation of the inputs
passes through a nonlinear activation function f in one unit. The typical f in DL are rectified linear unit
(ReLU), sigmoid and tanh functions, as shown in Figure9a. The number of layers in a FCNN has a signifi-
cant effect on the fitting and generalization abilities of the model. However, FCNNs were restricted to a few
layers due to the computational capacity of the available hardware, the vanishing and explosion gradient
problem during optimization, etc. With the development of hardware and optimization algorithms, ANNs
tend to become deeper. On the other hand, if a raw data set is the input directly into the FCNN, massive
parameters are required since each pixel corresponds to one feature, especially for high dimensional inputs.
Features are used to basically reduce the dimension at the input layer and as a result reducing the amount of
parameters in the model. FCNN requires preselected features with full reliance on experience and ignores
the structure of the input entirely. Automated feature selection algorithms are proposed (Qi etal.,2020), but
require high computational resources. To reduce the number of parameters in an FCNN and consider local
coherency in an image, convolutional neural networks (CNN) (Figure8b) were proposed to share network
parameters with convolutional filters.
CNNs have developed rapidly since 2010 for image classification and segmentation, and several popular
CNNs include VGGNet (Simonyan & Zisserman,2015) and AlexNet (Krizhevsky etal.,2017). CNNs are also
used in image denoising (Zhang, Zuo, Chen, etal.,2017) and super-resolution tasks (Dong etal.,2014). A
CNN uses original data rather than selected features as an input set and uses convolutional filters to restrict
the inputs of a neural network to within a local range. The convolutional filters are shared by different neu-
rons in the same layer. As shown in Figure9b, one typical block in CNN consists of one convolutional layer,
one nonlinear layer, one batch normalization and one pooling layer. Convolutional layers and nonlinear
layers provide the basis components of CNN. Batch normalization layers prevent gradient explosion and
stabilize the training. Pooling layers subsamples the input to extract key features. The simplest CNNs are
named as vanilla CNNs, which are CNNs with simple sequential structures (the same for vanilla FCNN).
Vanilla CNNs are reliable for most applications in geophysics, such as denoising, interpolation, velocity
modeling, and data interpretation, if many training samples and labels are available. CNN is invariant to
small changes in the inputs due to the pooling layers. However, pooling layers lose information, such that
CNN cannot characterize the changes in the input. Capsule networks (Sabour etal.,2017) are proposed to
simultaneously keep the invariance and characterize the changes. This is achieved by replacing scalars with
vectors to serve as inputs and outputs of the neurons. The length of the vector represents the probability that
one entity exists. The orientation of the vector stands for the parameters of the entity.
More DL network architectures have been proposed for specific tasks based on vanilla FCNNs or CNNs.
An autoencoder learns to reconstruct the inputs with useful representations with an encoder and a decoder
(Makhzani,2018). The encoder uses nonlinear layers to map the inputs to a latent space. The decoder uses
nonlinear layers to decode the latent features into the original data space. Autoencoders are trained in a
self-supervised manner. To obtain meaningful representation, additional constraints are imposed on the
network. For example, undercomplete autoencoders limit the size of the latent space smaller than that of
the inputs, such that the encoder extracts critical features. Sparse autoencoders are usually overcomplete
with larger latent space than the input space and impose a sparse regularization on the latent space. De-
noising autoencoders or contractive autoencoders learn useful representations by making the autoencoder
robust to the input's variations. Convolutional autoencoders (CAE, Figure8c) use convolutional layers in
the encoder and deconvolutional layers in the decoder.
U-Nets (Ronneberger etal.,2015) (Figure8d) have U-shaped structures and skip connections. The skip
connections bring low-level features to high levels. U-Net was first proposed for image segmentation and
YU AND MA
10.1029/2021RG000742
10 of 36
Reviews of Geophysics
has been applied in seismic data processing, inversion, and interpretation. The U-shape structure with a
contracting path and expanding path makes every data point in the output contain all information from the
input, such that the approach is suitable for mapping data in different domains, such as inverting velocity
from seismic records. The input size of the test set must be the same as that in the training set for a trained
U-Net. The data need to be processed patch-wisely if the size is not identical to the requirement of U-Net.
A GAN (Figure8e) can be applied in adversarial training with one generator to produce a fake image or any
other type of data and one discriminator to distinguish the produced one from the real ones. When training
the discriminator, the real data set and generated data set correspond to labels one and zero, respectively.
Additionally, when the generator is trained, all datasets correspond to the label one. Such a game will finally
allow the generative network to produce fake images that the discriminative network cannot distinguish
from real images. A GAN is used to generate samples with similar distributions as the training set. The gen-
erated samples are used for simulating realistic scenarios or expanding the training set. An extended GAN,
named CycleGAN, was proposed with two generators and two discriminators for signal processing (Zhu
etal.,2017). In CycleGAN, a two-way mapping is trained for mapping two datasets from one to the other.
The training set of CycleGAN is not necessarily paired as in a vanilla CNN, which makes it relatively easy
to construct training sets in geophysical applications.
RNNs (Figure8f) are commonly used for tasks related to sequential data, where the current state depends
on the history of inputs fed into the neural network. Long short-term memory (LSTM) (Hochreiter &
Schmidhuber,1997) is a widely used RNN that considers how much historical information is forgotten or
remembered. The main advantage of LSTM is in handling longer time duration of data compared to the
vanilla RNN, which has vanishing gradient problem for long sequences. Therefore, the inference accuracy
of LSTM increases with the amount of historical information considered. Gated recurrent unit (GRU) (Cho
etal.,2014) is a variant of LSTM with a simpler architecture. Compared to LSTM, GRU has similar perfor-
mance with fewer parameters, such that is computationally cheaper. In geophysical applications, RNNs are
YU AND MA
10.1029/2021RG000742
11 of 36
Figure 10. The procedure of exploration geophysics. (a) The subsurface structures. The seismic wave is excited at
sources (red point) and propagates downward to the reflector and then propagates upwards until recorded by the
receivers (blue points). (b) The seismic records are after processing. (c) The seismic imaging result, where the lines
stand for the reflectors. (d) Underground properties are interpreted to determine where the reservoir locates.
Reviews of Geophysics
mainly used for predicting the next sample of a temporally or spatially sequenced data set. RNNs are also
used for seismic wavefield or earthquake signal modeling by simulating the time-dependent discrete partial
differential equation.
3. DL Geophysical Applications
The most direct method for applying DL in geophysics is transferring geophysical tasks to computer vision
tasks, such as denoising or classification. However, in certain geophysics applications, the characteristics
of geophysical tasks or data are quite different from those of computer vision. For example, in geophysics,
YU AND MA
10.1029/2021RG000742
12 of 36
Figure 11. Comparison of traditional and DL-based methods in exploration geophysics. (a) In random denoising
tasks, the curvelet denoising method (Herrmann & Hennenfent,2008) assumes that the signal is sparse under curvelet
transform, and a matching method is used for denoising. In velocity inversion tasks, full-waveform inversion based on
the wave equation is used for forward and adjoint modeling in the optimization algorithm. In fault interpretation tasks,
faults are picked by interpreters. (b) The mentioned tasks are treated as regression problems that are optimized with
neural networks. Different tasks may require different neural network architectures.
Reviews of Geophysics
we have large-scale and high-dimensional data but fewer annotated labels. In this section, we introduce
how DL approaches relieve the bottlenecks of traditional methods, what difficulties we encounter and how
to solve them. The development of DL applications in exploration geophysics is first reviewed, followed by
applications in earthquake science, remote sensing and other areas.
3.1. Exploration Geophysics
Exploration geophysics images the Earth's subsurface by inverting collocated physical fields at the surface,
among which seismic wavefields are the most commonly used. Seismic exploration uses reflective seismic
waves to predict subsurface structures. The main processes of seismic exploration consist of seismic data
sampling and processing (denoising, interpolation, etc.), inversion (migration, imaging, etc.), and inter-
pretation (fault detection, facies classification, etc.). Figure10 summarizes the procedure of exploration
geophysics. Figure11 compares traditional and DL-based methods in exploration geophysics.
3.1.1. Seismic Data Processing
Seismic data are contaminated by different types of noise, such as random noise from the background, ground
rolls that travel along the surface with high energy and mask useful signals, and multiple that reflected mul-
YU AND MA
10.1029/2021RG000742
13 of 36
Figure 12. Deep learning for scattered ground-roll attenuation. On the left is the original noisy data set. On the right is
the denoised data set. The scattered ground roll marked by the red arrows is removed.
Reviews of Geophysics
ti-times between the interfaces. One of the long-standing problems in exploration geophysics is to remove
noise and improve the signal-to-noise ratio (SNR) of signals. Traditional methods use handcrafted filters or
regularization for denoising certain kinds of noise by analyzing the corre-
sponding features (Herrmann & Hennenfent,2008). However, handcraft-
ed filters fail when the signal and noise share a common feature space.
DL methods avoid feature selection when used for seismic denoising. For
example, U-Net-based DeepDenoiser can separate signals and noise by
learning a nonlinear regression (Zhu etal.,2019). Moreover, with DnCNN
(Zhang, Zuo, Chen, etal.,2017), a CNN for denoising, the same architec-
ture can be used for three kinds of seismic noise while achieving a high
SNR (Yu etal.,2019) as long as a corresponding training set is construct-
ed. However, there is still a long way to go. A DNN trained on synthetic
datasets does not have a good generalization ability to field data. To make
the network reusable, transfer learning (Donahue etal.,2014) can be used
for field data denoising. Sometimes the labels of clean data are difficult to
obtain, and one solution is to use multiple trials involving user-generated
white noise to simulate real white noise (Wu, Zhang, Lin, Li, & Liu,2019).
An example of scattered ground-roll attenuation is shown in Figure12
(Yu etal.,2019). Scattered ground roll is mainly observed in the desert
area, and is caused by the scattering of ground roll when the near surface
is laterally heterogeneous. The scattered ground roll is difficult to remove
because it occupies the same frequency domain as the reflected signals.
DnCNN was used to remove scattered ground roll successfully.
Due to environmental or economic limitations, seismic geophones are usu-
ally located irregularly or not densely enough under the principle of Nyquist
sampling. The reconstruction or regularization of seismic data to a dense
and regular grid is essential to improve inversion resolution. In the begin-
ning, end-to-end DNNs were proposed for the reconstruction of regularly
missing data (Wang, Zhang, Lu, etal., 2019) and randomly missing data
(Mandelli etal.,2018; Wang, Wang, etal.,2020). However, the training sets
are numerically synthetic, and do not generalize well to field data. We can
YU AND MA
10.1029/2021RG000742
14 of 36
Figure 13. The training set and seismic interpolation result (Zhang, Yang, etal.,2020). (a) A subset of the natural image data set. The natural image data set
was used to train a network for seismic data interpolation. (b) An under-sampled seismic record. (c) The interpolated record corresponding to (b). The regions
1.6–1.88s and 1.0–1.375km are enlarged at the top-right corner.
Figure 14. Phase picking based on U-Net. The inputs are seismological
data. The outputs are zeros above the first arrival in the green area, ones
below the first arrival in the yellow area, and twos for the first arrival
on the blue line. The green line indicates the predicted first arrival. This
experiment was performed based on the modified code from https://
github.com/DaloroAT/first_break_picking.
Reviews of Geophysics
borrow training data from a natural image data set to train DnCNN and then embed it in the traditional project
onto a convex set (POCS, Abma & Kabir,2006) framework (Zhang, Yang, etal.,2020). The resulting interpola-
tion algorithm generalized well to seismic data. Moreover, no new networks were required for the interpolation
of other datasets. Figure13 gives the training set and a simple interpolation result (Zhang, Yang, etal.,2020).
YU AND MA
10.1029/2021RG000742
15 of 36
Figure 15. Predicting the velocity model with U-Net from raw seismological data (Yang & Ma,2019). The columns indicate different velocity models. From top
to bottom are the ground truth velocity models, generated seismic records from one shot, and the predicted velocity models.
Figure 16. Converting a three-channel color image into a velocity model (Wang & Ma,2020). (a–c) are original color image, grayscale image, and
corresponding velocity model. (d) is the seismic record generated from a cross-well geometry on (c).
Reviews of Geophysics
First arrival picking is used to select the first jumps of useful signals and
has been automated but needs intense human intervention to check pick-
ings with significant static corrections, weak energy, low signal-to-noise
ratios, and dramatic phase changes. DL helps improve the automation
and accuracy of first arrival picking on realistic seismic data. It is natural
to transform first arrival picking into a classification problem by setting
the first arrival as ones and other locations as zeros when DL is used (Hu
etal., 2019). However, such a setting can cause imbalanced labels. An
interesting approach treats first arrival picking as an image classification
problem, where anything before the first arrival is set to zero, and all in-
stances after the first arrival are set to one (Wu, Zhang, Li, etal.,2019).
This method works well for noisy situations and field datasets. After the
segmentation image is obtained, a more advanced picking algorithm,
such as an RNN, can be applied to take advantage of the global informa-
tion (Yuan etal.,2020).
Figure14 shows the results of the first arrival picking based on U-Net. We used 8,000 synthetic seismologi-
cal samples. A gradient constraint was added to the loss function to enhance the continuity of the selected
positions. For the output, three classifications were set: zeros before the first arrival, ones after the first arriv-
al, and twos for the first arrival. The training data set was contaminated with strong noise and had missing
traces. The predicted picking results were close to the labels.
More DL-based seismic signal processing literature that does not belong to the mentioned scope is summa-
rized in this paragraph. Signal compression is essential for the storage and transmission of seismic data.
Traditional seismic data are stored in 32 bits per sample. With an RNN to estimate the relationships among
samples in a seismic trace and compress seismic data, only 16 bits are needed for lossless representations,
such that half storage is saved (Payani etal.,2019). Seismic registration aligns seismic images for tasks such
as time-lapse studies. However, when large shifts and rapid changes exist, this task is extremely difficult. A
CNN is trained with two seismic images as inputs and the shift as output by learning from the concept of
optical flow. The method outperforms traditional methods but is dependent on the training data set (Dhara
and Bagaini,2020).
3.1.2. Seismic Data Imaging
Seismic imaging is a challenging problem since traditional methods such as tomography and full waveform
inversion (FWI) suffer from several bottlenecks. 1. Imaging is time-consuming due to the curse of dimen-
sionality. 2. Imaging relies heavily on human interactions to select proper velocities. 3. Nonlinear optimi-
zation needs a good initialization or low frequency information, however there is a lack of low frequency
energy in recorded data. DL methods help relief the bottlenecks from several angles.
First, end-to-end DL-based imaging methods use recorded data as inputs and velocity models as outputs,
which provides a totally different imaging approach. DL methods avoid the mentioned bottlenecks, provid-
ing a next-generation imaging method. The first attempts at DL in staking (Park & Sacchi,2019), tomog-
YU AND MA
10.1029/2021RG000742
16 of 36
Figure 17. Velocity picking based on U-Net (Wang etal.,2021). The
inputs are seismological data on the left. The outputs are the picking
positions on the right. AP means approximate root mean square velocity.
PD_REG and PD_CLS represent the velocity predictions of the regression
network and classification network, respectively.
Figure 18. Modified recurrent neural network (RNN) based on the acoustic wave equation for wave modeling
(Liu2020). The diagram represents the discretized wave equation implemented in an RNN. The auto-differential
mechanics of a deep neural network (DNN) help to efficiently optimize the velocity and density.
Reviews of Geophysics
raphy (Araya-Polo etal.,2018) and FWI (Yang & Ma,2019) show promising results on synthetic 2D data.
One important issue is that the input is in the data space and the output is in the model space, both with
high dimensional parameters. U-Net is used to transfer from different spaces with different dimensions,
and downsampling is used to reduce the parameters while training the DNN (Yang & Ma,2019). Figure15
shows the velocity inversion results from Yang and Ma (2019).
However, end-to-end DL imaging also has disadvantages, such as a lack of training samples and restricted
input sizes due to memory limitations. An interesting work used smoothed natural images as velocity
models, thus producing a large number of models to construct the training set (Wang & Ma,2020). Fig-
ure16 shows an example on how (Wang & Ma,2020) convert a three-channel color image to a velocity
model.
To make DL-based imaging applicable to large scale inputs, more works aim to collaborate with traditional
methods and solve one of the mentioned bottlenecks, such as extrapolating the frequency range of seismic
data from high to low frequencies for FWI (Fang, Zhou, etal.,2020; Ovcharenko etal.,2019), and adding
constraints to FWI (Zhang & Alkhalifah,2019). To mitigate the “curse of dimensionality” problem of global
optimization in FWI, CAE is used to reduce the dimension of FWI by optimizing in the latent space (Gao
etal.,2019). Another work aims at the high computational cost of forward modeling when the high-order
finite difference method is used. A GAN is used to produce a high-quality wavefield from a low-quali-
ty wavefield with a lower-order finite difference in the context of surface-related multiples, ghosts, and
dispersion (Siahkoohi etal., 2019). U-Net can be used for velocity picking in stacking (Figure17, Wang
etal.,2021). The inputs are seismological data, and the outputs have values of one where the picks are lo-
cated and values of zero elsewhere.
An alternative is to replace the FWI object with an RNN loss function. The structure of an RNN is similar to
that of finite different time evolution, and the network parameters correspond to the selected velocity mod-
el. Therefore, optimizing an RNN is equivalent to optimizing FWI (Sun, Niu, etal.,2020). Such a strategy is
extended to the simultaneous inversion of velocity and density (Liu,2020). Figure18 shows the structure
of a modified RNN-based on the acoustic wave equation used in (Liu,2020). The diagram represents the
discretized wave equation implemented in an RNN with a flow chart. The optimized method in FWI can
also be learned by a DNN rather than with a gradient-descent-based approach (Sun & Alkhalifah,2020).
An ML-descent method is proposed to consider the historical information of the gradient based on an RNN
rather than handcrafted directions.
3.1.3. Seismic Data Interpretation and Attributes Analysis
Seismic interpretation (faults, layers, dips, etc.) or attribute analysis (impedance, frequency, facies, etc.) can
be used to help the extraction of subsurface geologic information and locate underground sweet points. How-
ever, both tasks are time-consuming since interventions by experts are required. Preliminary works show that
DL has the potential to improve the efficiency and accuracy in seismic interpretation or attribute analysis.
YU AND MA
10.1029/2021RG000742
17 of 36
Figure 19. (a) A post-stack dataset. (b) Fault prediction result of (a). (c) A synthetic dataset (Wu etal.,2020).
Reviews of Geophysics
The localization of faults, layers, and dips in seismic interpretation is similar to object detection in computer vi-
sion. Therefore, DNNs for image detection can be directly applied in seismic interpretation. However, unlike the
computer vision industry, it is difficult to obtain a public training set or to manually construct a training set for
field datasets. Building realistic synthetic datasets rather than handcrafted field datasets is more efficient and can
produce similar results. Therefore, synthetic samples are used for training. To build an approximately realistic
3D training data set, randomly choosing folding and faulting parameters in a reasonable range is required (Wu
etal.,2020). Then, the data set is used to train a 3D U-Net for the seismic structural interpretation of features,
such as faults, layers, and dips, in field datasets. If the detected objects are of a small proportion, a class-balanced
binary cross-entropy loss function is used to adjust the data imbalance so that the network is not trained to pre-
dict only zeros (Wu, Liang, etal.,2019). An alternative to a synthetic training set is a semi-automated approach
that annotates the targets on a coarse scale and predicts them on a fine scale (Wu, Zhang, Lin, Cao, etal.,2019).
An example of synthetic post-stack image and field data fault analysis is shown in Figure19 (Wu etal.,2020).
Attribute analysis is similar to image classification, where seismic images are inputs and areas with labels
as different attributes are output. Therefore, DNNs for image classification can be directly applied in seismic
attribute analysis (Das etal.,2019; Feng, Mejer Hansen, etal.,2020; You etal.,2020). If the attributes cannot
be directly computed from the seismic data, a DNN can work in a cascaded way (Das & Mukerji,2020). If
labels are not available, CAE is used for feature extraction, and then a clustering method, such as K-means,
is used for unsupervised clustering (Duan etal.,2019; He etal.,2018; Qian etal.,2018). Clustering refers
to grouping similar attributes in an unsupervised manner. For example, we can use clustering to decide
whether a region contains fluvial facies or faults based on stacked sections. CAE and K-means can further
be optimized simultaneously for better feature extraction (Mousavi, Zhu, Ellsworth, etal.,2019). To mitigate
the dependence of vanilla CNNs on the amount of labeled seismic data available, a 1D CycleGAN-based
algorithm was proposed for impedance inversion (Wang, Ge, et al.,2019). The CycleGAN did not require
training set pairing. Only two sets with and without high fidelity are needed. To consider the spatial conti-
nuity and similarity of adjacent traces, an RNN is used in facies analysis (Li, Lin, etal.,2019).
3.2. Earthquake Science
The goal of earthquake data processing is quite different from that of exploration geophysics; therefore,
this section focuses on DL-based earthquake signal processing. The preliminary processing of earthquake
signals includes classification to distinguish real earthquakes from noise and arrival picking to identify the
arrival times of primary (P) and secondary (S) waves. Further applications involve earthquake location and
Earth tomography. DL has shown promising results in these applications.
3.2.1. Earthquake and Noise Classification
Earthquake signal and noise classification is the most fundamental and difficult task in earthquake early
warning (EEW). Traditional EEW systems surfer from false and missed alerts. DNN can be directly applied
YU AND MA
10.1029/2021RG000742
18 of 36
Figure 20. The architecture of wavelet scattering transform (WST). Unlike in a convolutional neural network (CNN), the outputs of WST are combined with
the outputs of each layer. Then, the outputs of WST serve as features for a classifier.
Reviews of Geophysics
in signal and noise discrimination since it is a classification task. With a sufficient training set, DNN can
achieve up to 99.2% (Li etal.,2018) and 99.5% precision (Meier etal.,2019) in different regions. To detect
small and weak earthquake signals robust to strong noise and non-earthquake signals, a residual network
with convolutional and recurrent units is developed (Mousavi, Zhu, Sheng, etal.,2019). RNN and CNN
are also used in a more challenging task to distinguish between anthropogenic sources, such as mining or
quarry blasts, and tectonic seismicity (Linville etal.,2019). More categories of signals are required to iden-
tify in specific tasks, such as in volcano seismic detection (Titos etal.,2019). Volcano seismic signals can be
classified into six classes: long-period events, volcanic tremors, volcano-tectonic events, explosions, hybrid
events, and tornados (Malfante etal.,2018). Uncertainty is also considered in volcano-seismic monitoring
(Bueno etal.,2019).
We provide an example of using the wavelet scattering transform (WST) (Mallat,2012) and a support vector
machine for earthquake classification with a limited number of training samples. The WST involves a cas-
cade of wavelet transforms, a module operator, and an averaging operator, corresponding to convolutional
filters, a nonlinear operator, and a pooling operator in a CNN, respectively. The critical difference between
the WST and a CNN is that the filters are predesigned with the wavelet transform in the WST. In our case,
only 100 records were used for training, and 2,000 records were used for testing. We obtained a classification
accuracy as high as 93% with the WST method. Figure20 shows the architecture of the WST algorithm.
3.2.2. Arrival Picking
Arrival picking for earthquakes identifies the arrival time of P and S waves. Traditional automated arriv-
al picking algorithms, such as short-term average/long-term average method (STA/LTA), are less precise
than human experts and rely on thresholding setting. DL-based arrival picking overcomes these shortcom-
ings and helps illuminate the Earth structure clearly (Wang, Xiao, etal., 2019). With a sufficiently large
training set, one can achieve remarkably picking and classification accuracies higher than STA/LTA (Zhao
etal.,2019; Zhou etal.,2019), even close to or better than human experts (Ross etal., 2018, 4.5 million
seismograms training set). If labels are not sufficient, a GAN-based model EarthquakeGen can be used to
artificially expand labeled data sets (Wang, Zhang, & Li,2019). The detection accuracy was greatly improved
by performing artificial sampling for the training set. Simultaneous earthquake detection and phase picking
can further improve the accuracy of both tasks (Mousavi etal.,2020; Zhou etal.,2019).
3.2.3. Earthquake Location and Other Applications
Earthquake location and magnitudes estimation are important in EEW and subsurface imaging. Conven-
tional earthquake location significantly relies on a velocity model and suffers from inaccurate phase picking.
YU AND MA
10.1029/2021RG000742
19 of 36
Figure 21. Locating earthquake sources with deep learning. The black triangles are stations. Left: the blue dots are
the actual locations. Right: the red circles are the predicted locations. The radius of a circle represents the predicted
epicenter error (Zhang, Zhang, etal.,2020).
Reviews of Geophysics
CNN is used for earthquake location by using received waveforms at several stations as input and location
map as output (Zhang, Zhang, etal.,2020). This method worked well for earthquakes (ML<3.0) with low
SNRs, for which traditional methods fail. The prediction results and errors of earthquake source locations
are indicated in Figure21. DL also helps estimate earthquake locations and magnitudes based on signals
from a single station (Mousavi & Beroza,2020a; Mousavi & Beroza,2020b). Further applications involving
associating seismic phases, which involves grouping the phase picks on multiple stations associated with an
individual event (Ross etal.,2019), and relationship analysis between a strong earthquake and postseismic
deformation (Yamaga & Mitsui,2019).
3.3. Remote Sensing—a Geophysical Data Observation Means
Remote sensing is an important means to collect geophysical data and images by using sensors in satellites
or aerial crafts. Remote sensing imagery mainly includes optical images, hyperspectral images, and syn-
thetic aperture radar (SAR) images. Large-scale and high-resolution satellite optical color imagery can be
used for precision agriculture and urban planning. To address the issue of objection rotation variations, a
rotation-invariant CNN for object detection in very high-resolution optical remote sensing images was pro-
posed, where a rotation-invariant layer was introduced by enforcing the training samples before and after
rotation to share the same features (Cheng etal.,2016). If the labels are not accurate, a two-step training
approach was used where first the CNN was initialized by numerous inaccurate reference data and then
refined on a small amount of correctly labeled data (Maggiori etal.,2017). To further improve the image
resolution, the image contours were extracted with an edge-enhancement GAN to remove the artifacts and
noise in super resolution (Jiang etal.,2019).
Images obtained by hyperspectral sensors have rich spectral information, such that different land cover
categories can potentially be precisely differentiated. In recent years, numerous works have explored DL
methods for hyperspectral image classification (Li, Song, etal.,2019). To consider the spectral-spatial struc-
ture simultaneously, a 3D CNN rather than a 2D one should be used to extract the effective features of hy-
perspectral imagery (Chen, Jiang, etal.,2016). The extracted features are useful for image classification and
target detection and open a new window for future research. An alternative means to explore the relation-
ships among different spectrum channels is to use RNN, which regards hyperspectral pixels as sequential
data input (Mou etal.,2017).
SAR systems artificially enlarge the aperture of radar to produce high-resolution images. SAR can operate
in all-weather and day-and-night conditions. CNN is used for target classification in SAR images, which
avoided handcrafted features and provided higher accuracy (Chen, Wang, etal.,2016). To consider both the
amplitude and phase information of complex SAR imagery, a complex-valued CNN for SAR image classifi-
cation was proposed to process complex-valued inputs (Zhang, Wang, etal.,2017).
3.4. Other AI Geophysical Applications
We investigate more AI geophysical applications in this section. The topics are roughly arranged by the
order from the Earth to outer space.
YU AND MA
10.1029/2021RG000742
20 of 36
Figure 22. Artificial intelligence (AI) models reconstruct temperature anomalies with many missing values (Kadow
etal.,2020).
Reviews of Geophysics
3.4.1. The Earth's Structure
Understanding the structure of the Earth is a challenging task since observations are mainly limited on the
earth's surface. The earth is roughly divided into the surface, crustal layers, mantle and core and from the
surface to inside; however, the detailed structures and properties of the earth are not clear. Moisture as an
important soil attribute, is predicted historically with high fidelity from two recent years of satellite data,
showing LSTM's potential for hindcasting, data assimilation, and weather forecasting (Fang etal.,2017;
Fang, Kifer, etal.,2020). The high-resolution 3D CT data of rocks is required to determine the rock's prop-
erty but results in a small field of view. A CycleGAN was proposed to obtain super resolution images from
low resolution one by training on an unpaired data set (Niu etal.,2020). Volcanic deformation was detected
by using a CNN to classify interferometric fringes in wrapped interferograms (Anantrasirichai etal.,2018).
The crustal thickness in eastern Tibet and the western Yangtze craton are estimated by Rayleigh surface
wave velocities based on DNN (Cheng etal.,2019). The mantle thermal state of simplified model planets
was predicted based on DL with an accuracy of 99% for both the mean mantle temperature and the mean
surface heat flux compared to the calculated values (Shahnas & Pysklywec,2020).
3.4.2. Water Resources
Water on Earth has a great impact on ecosystems and natural disasters. DL can help address several major
challenges in water sciences (Shen,2018). DL can predict the loop current in the ocean by learning the
pattern in sea surface height (SSH). An LSTM was proposed to predict SSH and current loop in the Gulf of
Mexico within 40kilometers nine weeks in advance (Wang, Zhuang, etal.,2019). Due to the limit of compu-
tational memory, the region of interest is split into different sub-regions. Further works directly reconstruct
SSH on a large and spatial and temporal space based on sparsely sampled data with CNN (Manucharyan
etal.,2021). By using observation from satellite and coastal stations simultaneously, GAN can be used to re-
construct the SSH of the whole North-Sea (Zhang, Stanev, etal.,2020). DL also help estimate the iceberg in
the pan-Antarctic near-coastal zone that covers the whole Antarctic continent for monitoring ice melt and
sea level increasing (Barbat etal.,2019), and coastal inundation for a better understanding of the geospatial
and temporal characteristics of coastal flooding (Liu etal.,2019).
YU AND MA
10.1029/2021RG000742
21 of 36
Figure 23. The bottom panel shows a keogram from auroral data collected on January 21, 2006 at Rankin Inlet. The keogram consists of a single column from
the auroral images at different times. The middle panel shows the probabilities for the six categories as predicted by the ridge classifier trained with the entire
training data set. At the top are auroral images at different times (Clausen & Nickisch,2018).
Reviews of Geophysics
In addition to oceans, water is stored in different forms, such as rivers, lakes, rain, and ice. DL has found
its roles in estimating groundwater storage (Sun etal.,2019), global water storage in the US (Sun, Scanlon,
etal.,2020), measuring accurate river widths by super resolution (Ling etal., 2019), predicting the tem-
perature of lake water (Read etal.,2019), predicting rainfall and runoff (Akbari Asanjan etal.,2018), and
prediction water vapor retrieval from remote sensing data (Acito etal.,2020).
3.4.3. Atmospheric Science
Atmospheric science observes and predicts climate, weather and atmospheric phenomena. Global observa-
tion of global atmospheric parameters is difficult since the earth is extremely large and sensor locations are
limited. Researchers chose a CNN-based inpainting algorithm to reconstruct missing values in global cli-
mate datasets such as HadCRUT4 (Kadow etal.,2020, Figure22). Air pollution is damaging both the Earth's
environment and human health. Researchers used DL to estimate ground-level PM2.5 or PM10 levels by
using satellite observations and station measurements (Li etal.,2017; Shen etal.,2018; Tang etal.,2018).
DL also helps improve the accuracy of weather forecasting, which is a long-standing challenge in atmos-
pheric science (Bonavita & Laloyaux,2020; Scher & Messori,2021). The tracks of typhoons were predicted
with a GAN based on satellite images (Rüttgers etal.,2019). A six-hour-advance track with an average error
of 95.6km was produced. Flow-dependent typhoon-induced sea surface temperature cooling was estimated
by a DNN and used for improving typhoon predictions (Jiang etal.,2018).
3.4.4. Space Science
Global space parameter estimation and prediction are long-standing tasks in space science. Researchers
used a DNN to predict short-term and long-term 3D dynamic electron densities in the inner magneto-
sphere (Chu etal., 2017). This network can obtain the magnetospheric plasma density at any time and
for any location. A regularized GAN is used to reconstruct dynamic total electron content (TEC) maps
(Chen etal.,2019). Several existing maps were used as references to interpolate missing values in some
regions, such as the oceans. The TEC maps can also be predicted two hours in advance with an LSTM (Liu
etal., 2020) or one day in advance with a GAN (Lee etal.,2021). Further, a DNN is used to estimate the
relationship between electron temperature and electron density in small regions (Hu etal.,2020). There-
fore, the global electron density is easily measured and used to predict the global electron temperature. The
YU AND MA
10.1029/2021RG000742
22 of 36
CNN CAE U-Net GAN RNN
Supervised
(End-to-end)
Yu et al. 2019
Dhara and
Bagaini 2020
Wang, Wang,
et al. 2020
Yang and Ma
2019
Wu, Shi,
et al. 2019
Siahkoohi et al.
2019
Yuan et al.
2020
Linville et al.
2019
Semi/
unsupervised
Duan et al. 2019
Niu et al. 2020
Optimization
Oriented Xiao et al. 2021
Sun and
Alkhalifah
2020
Mousavi, Zhu,
Ellsworth, et al.
2019
Sun, Niu, et al.
2020 Wang,
McMechan, et al.
2020
Physical
constraint
Zhang, Yang,
et al. 2020
Wu and
McMechan
2019
Uncertainty
estimation
Mousavi and
Beroza 2020a
Tasistro ‐Hart
et al. 2020
Grana et al.
2020
Note. Here optimization oriented means using DNNs to optimize the traditional model-driven objective functions.
Table 2
Examples of Literature That Use Different Network Architectures for Tasks Beyond End-To-End training
Reviews of Geophysics
geomagnetic storm can be predicted with LSTM with uncertainty estimation (Tasistro-Hart etal.,2020),
providing confidence in the output.
An aurora is an astronomical phenomenon commonly observed in polar areas. Auroras are caused by dis-
turbances in the magnetosphere caused by the solar wind. Auroral classification is important for polar
and solar wind research. Researchers used DNN to classify auroral images (Clausen & Nickisch,2018, Fig-
ure23). The classification results can further be used to produce an auroral occurrence distribution (Zhong
etal.,2020). To handle the situation where limited images were annotated, a CycleGAN model was used to
extract key local structures from all-sky auroral images (Yang etal.,2019).
4. Future Trends Directions for DL in Geophysics
4.1. The Development Trends of DL in Geophysics
The landmark achievements of DL appeared after 2015, such as VGGNet (Simonyan & Zisserman,2015),
ResNet (He etal.,2016), AlexNet (Krizhevsky etal.,2017) and AlphaGo in 2016. The first introduction of DL
in subjects related to geophysics focused on remote sensing in 2016 and 2017 (Chen, Jiang, etal.,2016; Chen,
Wang, etal.,2016; Maggiori etal.,2017; Li etal.,2017), since remote sensing is a common technique widely
used in many areas. In 2018 and 2019, more geophysical areas, such as exploration geophysics (Araya-Polo
etal.,2018) and earthquake studies (Mousavi, Zhu, Sheng & Beroza,2019), started to employ DL.
The first attempts started with simple FCNN methods followed by complex networks, such as CNN, RNN,
and GAN models. With respect to the training set, early works used end-to-end training borrowed from
the computer vision area, which requires a large number of annotated labels, while recent works have
started to consider unsupervised learning (He etal.,2018) and the combination of DL with a physical model
(Chattopadhyay etal.,2020; Wu & McMechan,2019). In 2020, more works focused on the uncertainty of
DL methods (Cao etal., 2020; Grana etal.,2020; Mousavi & Beroza,2020a). More examples are listed in
Table2. From these trends, we can conclude that an increasing number of researchers are trying to develop
DL methods that are specifically designed for geophysical tasks to make DL methods more practical. In the
next subsection, we introduce these future trends in detail.
4.2. Future Directions for Deep Learning in Geophysics
DL, as an efficient artificial intelligence technique, is expected to discover geophysical concepts and inherit
expert knowledge through machine-assisted mathematical algorithms. Despite the success of DL in some
geophysical applications such as earthquake detectors or pickers, their use as a tool for most practical geo-
physics is still in its infancy. The main problems include a shortage of training samples, low signal-to-noise
ratios, and strong nonlinearity. Among these issues, the critical challenge is the lack of training samples in
geophysical applications compared to those in other industries. Several advanced DL methods have been
proposed related to this challenge, such as semi-supervised and unsupervised learning, transfer learning,
YU AND MA
10.1029/2021RG000742
23 of 36
Figure 24. Diagrams of transfer learning. (a) Transfer learning between different data sets. The parameters of one trained model can be moved to another
model as initialization conditions. (b) Transfer learning between different tasks. The first layers of one trained model can be copied to another model.
Reviews of Geophysics
multimodal DL, federated learning, and active learning. We suggest that a focus be placed on the subjects
below for future research in the coming decade.
4.2.1. Semi-Supervised and Unsupervised Learning
In practical geophysical applications, obtaining labels for a large data set is time-consuming and can even
be infeasible. Therefore, semi-supervised or unsupervised learning is required to relieve the dependence on
labels. Dunham etal.(2019) focused on the application of semi-supervised learning in a situation in which
the available labels were scarce. A self-training-based label propagation method was proposed, and it outper-
formed supervised learning methods in which unlabeled samples were neglected. Semi-supervised learning
takes advantage of both labeled and unlabeled datasets. The combination of AE and K-means is an efficient
YU AND MA
10.1029/2021RG000742
24 of 36
Figure 25. An illustration of multimodal deep learning.
Reviews of Geophysics
unsupervised learning method (He etal.,2018; Qian etal.,2018). An autoencoder is used to learn low-di-
mensional latent features in an unsupervised way, and then K-means is used to cluster the latent features.
4.2.2. Transfer Learning
Usually, we must train one DNN for a specific data set and a specific task. For example, a DNN may effec-
tively process land data but not marine data, or a DNN may be effective in fault detection but not in facies
classification. Transfer learning (Donahue etal.,2014) is suggested to increase the reusability of a trained
network for different datasets or different tasks.
In transfer learning with different datasets, the optimized parameters for one data set can be used as initial-
ization values for learning a new network with another data set; this process is called fine-tuning. Fine-tun-
ing is typically much faster and easier than training a network with randomly initialized weights from
scratch. In transfer learning involving different tasks, we assume that the extracted features should be the
same in different tasks. Therefore, the first layers in a model trained for one task are copied to the new
model for another task to reduce the training time. Another benefit of transfer learning is that with a small
number of training samples, we can promptly transfer the learned features to a new task or a new data set.
Diagrams of these two transfer learning methods are shown in Figure24. Further topics in transfer learn-
ing include the relationship between the transferability of features (Yosinski etal.,2014) and the distance
between different tasks and different data sets (Oquab etal.,2014).
4.2.3. Combination of DL and Traditional Methods
Can we combine traditional and DL approaches to make geophysical mechanics and DL collaborate? Intu-
itively, such a combination can produce a more precise result than traditional methods and a more reliable
result than DL methods.
How can DL be incorporated into traditional methods? In a traditional iteration optimization algorithm,
the thresholding-based denoiser can be replaced by a DL denoiser (Zhang, Zuo, Gu, etal.,2017) such that
the reconstructed results are improved. On the other hand, different tasks use the same denoiser without
training a new denoiser. Another technique, DIP, uses a DNN architecture as a constraint on the data and
ensembles traditional physical models for different tasks (Lempitsky etal., 2018). Similar to the idea of
DIP, Wu and McMechan(2019) showed that a DNN generator can be added to an FWI framework. First, a
U-Net-based generator

;FΘv
with random input v was used to approximate a velocity model m with high
accuracy. Then,

;Fv
was inserted into the FWI objective function,
YU AND MA
10.1029/2021RG000742
25 of 36
Figure 26. Federated learning. The clients train the deep neural network (DNN) with local data sets and uploads the
model gradient to the server. The server aggregates the gradients and updates the global model. Then, the updated
model is distributed to all the local clients. Many rounds of training are performed until the model meets a certain
accuracy requirement.
Reviews of Geophysics
EFWI



1
22
2

PF r
vd
;
(6)
where dr is the seismic record and P is the forward wavefield propagator. The gradient of EFWI with
respect to network parameters
Θ
is calculated with the chain rule. U-Net is only used for regularizing
the velocity model. After training, one forward propagation of the network will produce a regularized
result.
Traditional optimization methods also benefit from the autodifference mechanism in DL, which makes op-
timization more efficient by replacing conjugate gradient descent or LBGFS with DL optimization methods,
such as SGD and Adam (Sun, Niu, etal.,2020; Wang, Chang, etal.,2020). DL also inspired new directions in
the study of traditional nonlinear optimization algorithms, such as ML-descent (Sun and Alkhalifah,2020)
and DL-based adjoint state methods (Xiao etal.,2021).
How can traditional methods be incorporated into DL? With an additional physical constraint on DL meth-
ods, fewer training samples are required to obtain a more generalized inference than those of traditional
methods. Raissi etal.(2019) proposed a physically informed neural network (PINN) that combines training
data and physical equation constraints for training. Taking wave modeling as an example, the wavefield was
represented with a DNN,

, ,;u xt F xt Θ
, such that the acoustic wave equation was:
u c u F xt c F xt
tt
u xt F xt
tt





22

, ,; ,; ,;
(7)
How can DL and traditional methods cooperate? Another benefit of combining data-driven and mod-
el-driven approaches is that we can obtain high-resolution solutions on a large scale. The process on a
large scale was numerically solved with a low-resolution grid based on physical equations. On a small
scale, the process was solved by data-driven DL methods (Chattopadhyay etal.,2020). Therefore, the high
computational demand on a fine scale is avoided. DL can also be used for discovering physical concepts
(Iten etal.,2020).
It is more common to hear someone ask, “Does machine learning have a real role in hydrological mod-
eling?” rather than, “What role will hydrological science play in the age of machine learning?” (Nearing
etal., 2020). As the authors claim, DL has uncovered the principles in large-scale rainfall-runoff simula-
tions, which cannot be explained by physical models. DL has a great impact on traditional methods, causing
a collision between new and old ideas. We believe that DL and physical-based methods will be used together
to move science forward for a long time.
4.2.4. Multimodal Deep Learning
To improve the resolution of inversion, the joint inversion of data from different sources has been a popular
topic in recent years (Garofalo etal.,2015). One of the advantages of DNNs is that they can fuse informa-
YU AND MA
10.1029/2021RG000742
26 of 36
Figure 27. An illustration of active learning. We choose samples with high uncertainty and manually annotate them to
serve as training samples.
Reviews of Geophysics
tion from multiple inputs. In multimodal DL (Ngiam etal.,2011; Ramachandram & Taylor,2017), inputs
are from different sources, such as seismic data and gravity data. Collecting data from different sources can
help relieve the bottleneck of a limited number of training samples. Besides, using multimodal datasets can
increase the quality and reliability of DL methods (Zhang, Stanev, etal.,2020). Feng, Fang etal.(2020) used
data integration to forecast streamflow where 23 variables were used, such as precipitation, solar radiation,
and temperature. Figure25 shows an illustration of multimodal DL.
4.2.5. Federated Learning
To provide a practical training set in DL for geophysical applications, collecting available datasets from
different institutes or corporations might be a possible solution. However, data transfer via the internet is
time-consuming and expensive for large-scale geophysical datasets. Besides, most datasets are protected and
cannot be shared. Federated learning was first proposed by Google (Mcmahan etal.,2017; Li etal.,2020) to
train a DNN with user data from millions of cellphones without privacy or security issues. The encrypted
gradients from different clients are assembled in a central server, thus avoiding data transfer. The server up-
dates the model and distributes information to all clients (Figure26). In a simple federated learning setting,
the clients and the server share the same network architecture. We give a possible example of federated
learning in geophysics based on the concept that some corporations do not share the annotations of first ar-
rivals; however, they can benefit from federated learning by training a DNN together for first arrival picking.
4.2.6. Uncertainty Estimation
One of the remaining questions associated with applying DL in geophysics is related to whether the results
of DL-based methods without a solid theoretical foundation can be trusted. DL-based uncertainty analysis
methods include Monte Carlo dropout (Gal & Ghahramani,2016), Markov chain Monte Carlo (MCMC)
(de Figueiredo etal.,2019), variational inference (Subedar etal.,2019), etc. For example, in Monte Carlo
dropout, dropout layers are added to each original layer to simulate a Bernoulli distribution. With multi-
ple realizations of dropout, the results are collected, and the variance is computed as the uncertainty. DL
with uncertainty estimation in inference is reported in areas such as volcano-seismic monitoring (Bueno
etal.,2019), geomagnetic storm forecasting (Tasistro-Hart etal.,2020), weather forecasting (Scher & Mes-
sori,2021; Bonavita & Laloyaux,2020), soil moisture predictions (Fang, Kifer, etal.,2020) and earthquake
locations estimation (Mousavi & Beroza,2020b).
4.2.7. Active Learning
To train a high-precision model using a small amount of labeled data, active learning is proposed to imitate
the self-learning ability of human beings (Yoo & Kweon,2019). An active learning model selects the most
useful data based on a sampling strategy for manual annotation and adds this data to the training set; then,
the updated data set is used for the next round of training (Figure27). One of the sampling strategies is
based on the uncertainty principle, that is, the samples with high uncertainty are selected. Taking fault
detection as an example, if a trained network is not sure whether a fault exists at a given location, we can
annotate the fault manually and add the sample to the training set.
5. Summary
In this review, the key concepts of DL approaches are introduced, a broad range of applications of DL in
geophysics are presented with the pros and cons, finally the future trends are discussed for geophysical
readers who are beginning their trip in DL. DL methods have created both opportunities and challenges
in geophysical fields. Pioneering researchers have provided a basis for DL in geophysics with promising
results; more advanced DL technologies and more practical problems must now be explored. To close this
study, we summarize a roadmap for applying DL in different geophysical tasks in terms of three levels.
Traditional methods are time-consuming and require intensive human labor and expert knowledge,
such as in first-arrival selection and velocity selection in exploration geophysics.
Traditional methods have difficulties and bottlenecks. For example, geophysical inversion requires good
initial values and high accuracy modeling and suffers from local minimization.
Traditional methods cannot handle some cases, such as multimodal data fusion and inversion.
YU AND MA
10.1029/2021RG000742
27 of 36
Reviews of Geophysics
With the development of new artificial intelligence models beyond DL and advances in research into the
infinite possibilities of applying DL in geophysics, we can expect intelligent and automatic discoveries of
unknown geophysical principles soon.
Appendix A: A Deep Learning Tutorial for Beginners
A Coding Example of a DnCNN
The implementation of DL algorithms in geophysical data processing is quite simple based on existing
frameworks, such as Caffe, Pytorch, Keras, and TensorFlow. Here, we provide an example of how to use
Python and Keras to construct a DnCNN for seismic denoising. The code requires 12 lines for data set
loading, model construction, training, and testing. The data set is preconstructed and includes a clean sub-
set and a noisy subset; the overall data set includes 12,800 samples with a size of 64
64 (available at
https://bit.ly/33SyXPO).
Any appropriate plotting tool can be used for data visualization. The training takes less than one hour on an
NVidia 2080Ti graphics processing unit. The readers can try this code in their own areas as long as a training
set is compatibly constructed.
Tips for Beginners
We introduce several practical tips for beginners who want to explore DL in geophysics from the perspec-
tive of the three most critical steps in DL: data generation, network construction and training. Though
exploration geophysics is used as example, the tips for data generation and network training are generally
applicable to most areas. Network construction generally depends on the task.
Data Generation
As noted by Poulton(2002), “training a feed-forward neural network is 10% of the effort involved in an
application; deciding on the input and output data coding and creating good training and testing sets is
90% of the work”. In DL, we advise that the percentages of the effort for network construction and data set
preparation should be 40% and 60%. First, most DL approaches use an original data set as the input, thus
reducing feature extraction efforts. Second, a wider variety of network architectures and parameters can be
used in DL compared to those in traditional neural networks. Overall, constructing a proper training set
plays a more prominent role in DL.
YU AND MA
10.1029/2021RG000742
28 of 36
Reviews of Geophysics
Synthetic datasets can be used effectively in DL, which is advantageous since labeled real datasets are some-
times difficult to obtain. First, to assess the applicability of DL in a specific geophysical application, using
synthetic datasets is the most convenient method. Second, if a satisfactory result is obtained with synthetic
datasets, a few annotated real datasets can be used for transfer learning via parameter tuning. Third, if
the synthetic datasets are sufficiently complicated, that is, if the most important factors are considered
when generating the datasets, the trained network may be able to process realistic datasets directly (Wu
etal.,2020; Wu, Liang, etal.,2019).
A synthetic training set should be diverse. First, we suggest using an existing synthetic data set with an open
license, instead of generating a data set. For specific tasks, such as FWI, a data set may need to be generated
based on a wave equation. Second, data augmentation methods, such as rotation, reflection, scaling, trans-
lation, and adding noise, missing traces, or faults to clean datasets, can be used to expand the training set.
The goal is to generate extremely large synthetic datasets that are as close to realistic datasets as possible.
To generate realistic datasets, we suggest using existing methods to generate labels that should then be
checked by a human. For example, in first-arrival picking, an automatic picking algorithm is used to pre-
process the datasets, and the results are then provided to an expert who identifies the outliers. We also
suggest using active learning (Yoo & Kweon,2019) to provide a semiautomated labeling procedure. First, all
datasets with machine annotation are used to train a DNN, and the samples with high predicted uncertainty
are required to be manually annotated.
Network Construction for Different Tasks
Beginners are suggested to use a DnCNN or U-Net for testing. DnCNNs are available for most tasks in which
the input and output share the same domain, such as denoising, interpolation, and attribute analysis. The
input size of a DnCNN can vary since there are no pooling layers involved. However, each output data
point is determined by a local field from the input rather than from the entire input set. Additionally, U-Net
contains pooling layers, and all input points are used to determine an output point. U-Nets are available for
tasks even when the inputs and outputs are in different domains, such as in FWI. However, the input size
of U-Net is fixed once trained and the data need processed patch-wisely.
Combining a CAE and K-means is suggested for unsupervised clustering tasks, such as attribute classifica-
tion. We do not suggest CycleGAN for geophysical tasks since the training process is extremely time-con-
suming and the results are not stable. An RNN provides a high-performance framework for time-dependent
tasks, such as forward wave modeling and FWI. RNNs are also used for regression and classification tasks
involving temporal or spatial sequential datasets, such as in the denoising of a single trace.
To adjust the hyperparameters of a DNN and optimization algorithms, we suggest using an autoML toolbox,
such as Autokeras, instead of manually adjusting the values. The basic objective is to search for the best
parameter combination within a given sampling range. Such a search is exceptionally time-consuming, and
a random search strategy may accelerate the tuning process. Moreover, for most applications, the default
architecture gives reasonable results.
Training, Validation, and Testing
The available data set should be split into three subsets: one training set, one validation set, and one test set
to optimize the network parameters. The proportions of the subsets depend on the overall size of a data set.
For datasets with 10–50K samples, the proportions are suggested to be 60%, 20%, and 20%, respectively. For
larger datasets (for instance, those larger than 1M), much smaller portions are often used for validation and
test (1%–5%) since the alternative can result in using unnecessarily large test/validation sets and wasting
the data that can be used for training and building a better model. In a classification task, we suggest using
one-hot coding in training. The validation set is used to test the network during training. Then, the model
with the best validation accuracy is selected rather than the final trained model. If the validation accuracy
does not improve or decrease after some saturation during training, an early stopping strategy is suggested
to avoid overfitting. Network hyperparameters should be tuned according to the validation accuracy. The
validation set is used to guide training, and the test set is used to test the model based on unseen data sets;
however, the test set should not be used for hyperparameter tuning.
YU AND MA
10.1029/2021RG000742
29 of 36
Reviews of Geophysics
Two commonly seen issues during training are as follows: the validation loss is less than the training loss,
and the loss is not a number. Intuitively, the training loss should be less than the validation loss since the
model is trained with a training data set. Several potential reasons for this issue are as follows: 1. Regulari-
zation occurs during training but is ignored during validation, such as in the dropout layer; 2. The training
loss is obtained by averaging the loss of each batch during an iteration, and the validation loss is obtained
based on the loss after one iteration; and 3. The validation set may be less complicated than the training set,
especially when only the training set has been augmented. The potential reasons for NaN loss are as follows:
1. The learning rate is too high; 2. In an RNN, one should clip the gradient to avoid gradient explosion and
3. Zero is used as a divisor, negative values are used in logarithm, or an exponent is assigned too large of a
value.
Glossary
AE Autoencoder; an ANN with the same inputs and outputs.
AI Artificial Intelligence; Machines are taught to think like humans.
ANN Artificial neural network; a computing system inspired by biological neural networks that
constitute animal brains.
Aurora A natural light display in the earth's sky; disturbances in the magnetosphere caused by the
solar wind.
BNN Bayesian neural network; the network parameters are random variables instead of regular
variables.
CAE Convolutional autoencoder; an AE with shared weights.
CNN Convolutional neural network; a DNN with shared weights.
DDTF Data-driven tight frame; A dictionary learning method using a tight frame constraint for the
dictionary.
Deblending In seismic exploration, several explosion sources are shot very close in time to improve ef-
ficiency. Then, the seismic waves from different sources are blended. The recorded data set
first needs to be deblended before further processing.
Dictionary A set of vectors used to represent signals as a linear combination.
DIP Deep image prior; the architecture of a DNN is used as a prior constraint for an image.
DL Deep learning; a machine learning technology based on a deep neural network.
DnCNN Denoised convolutional neural network.
DNN Deep neural network; an ANN with many layers between the input and output layers.
DS Double sparsity; the data are represented with a sparse coefficient matrix multiplied by an
adaptive dictionary. The adaptive dictionary is represented by a sparse coefficient matrix
multiplied by a fixed dictionary.
Event In exploration geophysics, a seismic event means reflected waves with the same phase. In
seismology, an event means a happened earthquake.
Facies A seismic facies unit is a mapped, three-dimensional seismic unit composed of groups of
reflections whose parameters differ from adjacent facies units.
Fault a discontinuity in a volume of rock across which there has been significant displacement as
a result of rock-mass movement.
FCN Fully convolutional network; an FCN is a network that contains no fully connected layers.
Fully connected layers do not share weights.
FCNN Fully connected neural network; an FCNN is a network composed of fully connected layers.
FWI Full waveform inversion; full waveform information is used to obtain subsurface parameters.
FWI is achieved based on the wave equation and inversion theory.
GAN Generative adversarial network; GANs are used to generate fake images. A GAN contains a
generative network and a discriminative network. The generative network tries to produce a
nearly real image. The discriminative network tries to distinguish whether the input image
is real or generated. Therefore, such a game will eventually allow the generative network to
produce fake images that the discriminative network cannot distinguish from real images.
Graphics processing unit (GPU) A parallel computing device. GPUs are widely used for training neu-
ral works in deep learning.
YU AND MA
10.1029/2021RG000742
30 of 36
Reviews of Geophysics
HadCRUT4 Temperature records from Hadley Centre (sea surface temperature) and the Climatic Re-
search Unit (land surface air temperature).
K-means A classical clustering algorithm, where K is the number of clusters.
K-SVD A dictionary learning method using SVD for dictionary updating.
LSTM long short-term memory; LSTM considers how much historical information is forgotten or
remembered with adaptive switches.
Magnetosphere Range of the magnetic field surrounding an astronomical object where charged particles
are affected.
ML Earthquake local magnitude; a method for measuring earthquake scale.
Patch In dictionary learning, an image is divided into many patches (blocks) that are the same size
as the atoms in a dictionary.
PINN Physical informed neural network; A physical equation is used to constrain the neural
network.
PM Particulate matter. PM10 are coarse particles with a diameter of 10 micrometers or less;
PM2.5 are fine particles with a diameter of 2.5micrometers or less.
ResNet Residual neural network; ResNets contain skip connections to jump over several layers. The
output of a residual block is the residual between the input and the direct output.
RNN Recurrent neural network; in time-sequenced data processing applications, RNNs use the
output of a network as the input of the subsequent process to consider the historical context.
SAR Synthetic aperture radar; the motion of a radar antenna over a target is treated as an antenna
with a large aperture. The larger the aperture is, the higher the image resolution will be.
Solar wind A stream of charged particles released from the upper atmosphere of the Sun.
Sparse coding Input data are represented in the form of a linear combination of a dictionary where the
coefficients are sparse.
Sparsity The number of nonzero values in a vector.
SVD Singular value decomposition; a matrix factorization method. A=USV, where U and V are
two orthogonal matrices, S is a diagonal matrix whose elements are the singular values of A.
SVD is used for dimension reduction by removing the smaller singular values. SVD is also
used for recommendation systems and natural language processing.
Tight frame A frame provides a redundant, stable way of representing a signal, similar to dictionary. A
tight frame is a frame with the perfect reconstruction property; i.e., WTW=I.
Tomography Inversion of the subsurface velocity based on travel time information.
U-Net U-shaped network; U-Nets have U-shaped structures and skip connections. The skip connec-
tions bring low-level features to high levels.
Wave equation A partial differential equation that controls wave propagation.
WST Wavelet scattering transform; a transform involves a cascade of wavelet transforms, a module
operator, and an averaging operator.
Data Availability Statement
Data were not used, nor created for this research.
References
Abma, R., & Kabir, N. (2006). 3D interpolation of irregular data with a POCS algorithm. Geophysics, 71(6), 91–97. https://doi.
org/10.1190/1.2356088
Acito, N., Diani, M., & Corsini, G. (2020). CWV-Net: A deep neural network for atmospheric column water vapor retrieval from hyper-
spectral VNIR data. IEEE Transactions on Geoscience and Remote Sensing, 58(11), 8163–8175. https://doi.org/10.1109/tgrs.2020.2987905
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation.
IEEE Transactions on Signal Processing, 54(11), 4311–4322. https://doi.org/10.1109/tsp.2006.881199
Akbari Asanjan, A., Yang, T., Hsu, K., Sorooshian, S., Lin, J., & Peng, Q. (2018). Short-term precipitation forecast based on the PER-
SIANN system and LSTM recurrent neural networks. Journal of Geophysical Research: Atmosphere, 123(22), 12543–12563. https://doi.
org/10.1029/2018jd028375
Anantrasirichai, N., Biggs, J., Albino, F., Hill, P., & Bull, D. (2018). Application of machine learning to classification of volcanic defor-
mation in routinely-generated InSAR data. Journal of Geophysical Research, 123(8), 6592–6606. https://doi.org/10.1029/2018JB015911
Araya-Polo, M., Jennings, J., Adler, A., & Dahlke, T. (2018). Deep-learning tomography. The Leading Edge, 37(1), 58–66. https://doi.
org/10.1190/tle37010058.1
YU AND MA
10.1029/2021RG000742
31 of 36
Acknowledgments
The work was supported in part by
the NSFC under grant nos. 41625017
and 41804102, National Key Research
and Development Program of China
under grant nos. 2017YFB0202902 and
2018YFC1503705. The authors thank
Society of Exploration Geophysicists,
Nature Research, and American Asso-
ciation for the Advancement of Science
for allowing us to reuse the original
figures from their journals.
Reviews of Geophysics
Barbat, M. M., Rackow, T., Hellmer, H. H., Wesche, C., & Mata, M. M. (2019). Three years of near-coastal Antarctic iceberg distribution
from a machine learning approach applied to SAR imagery. Journal of Geophysical Research: Oceans, 124(9), 6658–6672. https://doi.
org/10.1029/2019jc015205
Bergen, K. J., Johnson, P. A., de Hoop, M. V., & Beroza, G. C. (2019). Machine learning for data-driven discovery in solid earth geoscience.
Science, 363(6433), 1–10. https://doi.org/10.1126/science.aau0323
Bonavita, M., & Laloyaux, P. (2020). Machine learning for model error inference and correction. Journal of Advances in Modeling Earth
Systems, 12(12), e2020MS002232. https://doi.org/10.1029/2020ms002232
Bueno, A., Benitez, C., De Angelis, S., Moreno, A. D., & Ibanez, J. M. (2019). Volcano-seismic transfer learning and uncertainty quantifi-
cation with Bayesian neural networks. IEEE Transactions on Geoscience and Remote Sensing, 58(2), 892–902. https://doi.org/10.1109/
TGRS.2019.2941494
Cai, J., Ji, H., Shen, Z., & Ye, G. (2014). Data-driven tight frame construction and image denoising. Applied and Computational Harmonic
Analysis, 37(1), 89–105. https://doi.org/10.1016/j.acha.2013.10.001
Cao, R., Earp, S., de Ridder, S. A. L., Curtis, A., & Galetti, E. (2020). Near-real-time near-surface 3D seismic velocity and uncertainty
models by wavefield gradiometry and neural network inversion of ambient seismic noise. Geophysics, 85(1), KS13–KS27. https://doi.
org/10.1190/geo2018-0562.1
Chattopadhyay, A., Subel, A., & Hassanzadeh, P. (2020). Data-driven super-parameterization using deep learning: Experimentation with
multiscale Lorenz 96 systems and transfer learning. Journal of Advances in Modeling Earth Systems, 12(11), e2020MS002084. https://
doi.org/10.1029/2020ms002084
Chen, R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. (pp. 6572–6583). In Proceedings of the
32nd International Conference on Neural Information Processing Systems (NIPS'18). https://dl.acm.org/doi/10.5555/3327757.3327764
Chen, S., Wang, H., Xu, F., & Jin, Y. (2016). Target classification using the deep convolutional networks for SAR images. IEEE Transactions
on Geoscience and Remote Sensing, 54(8), 4806–4817. https://doi.org/10.1109/tgrs.2016.2551720
Chen, Y., Jiang, H., Li, C., Jia, X., & Ghamisi, P. (2016). Deep feature extraction and classification of hyperspectral images based on
convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 54(10), 6232–6251. https://doi.org/10.1109/
tgrs.2016.2584107
Chen, Z., Jin, M., Deng, Y., Wang, J.-S., Huang, H., Deng, X., & Huang, C.-M. (2019). Improvement of a deep learning algorithm for total
electron content maps: Image completion. Journal of Geophysical Research, 124(1), 790–800. https://doi.org/10.1029/2018ja026167
Cheng, G., Zhou, P., & Han, J. (2016). Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote
sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12), 7405–7415. https://doi.org/10.1109/tgrs.2016.2601622
Cheng, X., Liu, Q., Li, P., & Liu, Y. (2019). Inverting Rayleigh surface wave velocities for crustal thickness in eastern Tibet and the west-
ern Yangtze craton based on deep learning neural networks. Nonlinear Processes in Geophysics, 26(2), 61–71. https://doi.org/10.5194/
npg-26-61-2019
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations
using RNN encoder-decoder for statistical machine translation. (pp. 1724–1734). In Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing (EMNLP). https://doi.org/10.3115/v1/D14-1179 arXiv preprint arXiv:1406.1078.
Chu, X., Bortnik, J., Li, W., Ma, Q., Denton, R., Yue, C., etal. (2017). A neural network model of three-dimensional dynamic electron den-
sity in the inner magnetosphere. Journal of Geophysical Research, 122(9), 9183–9197. https://doi.org/10.1002/2017ja024464
Clausen, L. B. N., & Nickisch, H. (2018). Automatic classification of auroral images from the oslo auroral themis (oath) data set using
machine learning. Journal of Geophysical Research: Space Physics, 123(7), 5640–5647. https://doi.org/10.1029/2018ja025274
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An over-
view. IEEE Signal Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/MSP.2017.2765202
Das, V., & Mukerji, T. (2020). Petrophysical properties prediction from prestack seismic data using convolutional neural networks. Geo-
physics, 85(5), N41–N55. https://doi.org/10.1190/geo2019-0650.1
Das, V., Pollack, A., Wollner, U., & Mukerji, T. (2019). Convolutional neural network for seismic impedance inversion. Geophysics, 84(6),
R869–R880. https://doi.org/10.1190/geo2018-0838.1
de Figueiredo, L. P., Grana, D., Roisenberg, M., & Rodrigues, B. B. (2019). Gaussian mixture Markov chain Monte Carlo method for linear
seismic inversion. Geophysics, 84(3), R463–R476. https://doi.org/10.1190/geo2018-0529.1
DeVries, P. M. R., Viegas, F., Wattenberg, M., & Meade, B. J. (2018). Deep learning of aftershock patterns following large earthquakes.
Nature, 560(7720), 632–634. https://doi.org/10.1038/s41586-018-0438-y
Dhara, A., & Bagaini, C. (2020). Seismic image registration using multiscale convolutional neural networks. Geophysics, 85(6), V425–V441.
https://doi.org/10.1190/geo2019-0724.1
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recogni-
tion. International Conference on Machine Learning, 647–655. https://dl.acm.org/doi/10.5555/3044805.3044879
Dong, C., Loy, C. C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. (pp. 184–199). European
Conference on Computer Vision. https://doi.org/10.1007/978-3-319-10593-2_13
Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical
Association, 90(432), 1200–1224. https://doi.org/10.1080/01621459.1995.10476626
Duan, Y., Zheng, X., Hu, L., & Sun, L. (2019). Seismic facies analysis based on deep convolutional embedded clustering. Geophysics, 84(6),
IM87–IM97. https://doi.org/10.1190/geo2018-0789.1
Dunham, M. W., Malcolm, A., & Kim Welford, J. (2019). Improved well-log classification using semisupervised label propagation and
self-training, with comparisons to popular supervised algorithms. Geophysics, 85(1), O1–O15. https://doi.org/10.1190/geo2019-0238.1
Fang, J., Zhou, H., Elita Li, Y., Zhang, Q., Wang, L., Sun, P., & Zhang, J. (2020). Data-driven low-frequency signal recovery using deep-learn-
ing predictions in full-waveform inversion. Geophysics, 85(6), A37–A43. https://doi.org/10.1190/geo2020-0159.1
Fang, K., Kifer, D., Lawson, K., & Shen, C. (2020). Evaluating the potential and challenges of an uncertainty quantification method
for long short-term memory models for soil moisture predictions. Water Resources Research, 56(12), e2020WR028095. https://doi.
org/10.1029/2020wr028095
Fang, K., Shen, C., Kifer, D., & Yang, X. (2017). Prolongation of SMAP to spatiotemporally seamless coverage of continental U.S. using a
deep learning neural network. Geophysical Research Letters, 44(21), 11030–11039. https://doi.org/10.1002/2017gl075619
Feng, D. P., Fang, K., & Shen, C. P. (2020). Enhancing streamflow forecast and extracting insights using long-short term memory networks
with data integration at continental scales. Water Resources Research, 56(9), e2019WR026793. https://doi.org/10.1029/2019wr026793
Feng, R., Mejer Hansen, T., Grana, D., & Balling, N. (2020). An unsupervised deep-learning method for porosity estimation based on post-
stack seismic data. Geophysics, 85(6), M97–M105. https://doi.org/10.1190/geo2020-0121.1
YU AND MA
10.1029/2021RG000742
32 of 36
Reviews of Geophysics
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International
conference on machine learning. (pp. 1050–1059). PMLR. https://dl.acm.org/doi/10.5555/3045390.3045502
Gao, Z., Pan, Z., Gao, J., & Xu, Z. (2019). Building long-wavelength velocity for salt structure using stochastic full waveform inversion
with deep autoencoder based model reduction. In SEG technical program expanded abstracts. (pp. 1680–1684). Society of Exploration
Geophysicists. https://doi.org/10.1190/segam2019-3215572.1
Garofalo, F., Sauvin, G., Socco, L. V., & Lecomte, I. (2015). Joint inversion of seismic and electric data applied to 2D media. Geophysics,
80(4), EN93–EN104. https://doi.org/10.1190/geo2014-0313.1
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., etal., (2014). Generative adversarial networks. Advances in
Neural Information Processing Systems. 2672–2680. https://dl.acm.org/doi/10.5555/2969033.2969125
Grana, D., Azevedo, L., & Liu, M. (2020). A comparison of deep machine learning and Monte Carlo methods for facies classification from
seismic data. Geophysics, 85(4), WA41–WA52. https://doi.org/10.1190/geo2019-0405.1
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern
recognition. (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90
He, Y., Cao, J., Lu, Y., Gan, Y., & Lv, S. (2018). Shale seismic facies recognition technology based on sparse autoencoder. In International
Geophysical Conference. (pp. 1744–1748) Society of Exploration Geophysicists and Chinese Petroleum Society. https://doi.org/10.1190/
IGC2018-428
Helmy, T., Fatai, A., & Faisal, K. (2010). Hybrid computational models for the characterization of oil and gas reservoirs. Expert Systems with
Applications, 37(7), 5353–5363. https://doi.org/10.1016/j.eswa.2010.01.021
Herrmann, F. J., & Hennenfent, G. (2008). Non-parametric seismic data recovery with curvelet frames. Geophysical Journal International,
173(1), 233–248. https://doi.org/10.1111/j.1365-246x.2007.03698.x
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
https://doi.org/10.1162/neco.1997.9.8.1735
Hu, A., Carter, B., Currie, J., Norman, R., Wu, S., & Zhang, K. (2020). A deep neural network model of global topside electron temperature
using incoherent scatter radars and its application to GNSS radio occultation. Journal of Geophysical Research, 125(2), 1–17. https://doi.
org/10.1029/2019ja027263
Hu, L., Zheng, X., Duan, Y., Yan, X., Hu, Y., & Zhang, X. (2019). First-arrival picking with a U-Net convolutional network. Geophysics, 84(6),
U45–U57. https://doi.org/10.1190/geo2018-0688.1
Huang, K., You, J., Chen, K., Lai, H., & Don, A. (2006). Neural network for parameters determination and seismic pattern detection (pp.
2285–2289). SEG Technical Program Expanded Abstracts.
Iten, R., Metger, T., Wilming, H., Del Rio, L., & Renner, R. (2020). Discovering physical concepts with neural networks. Physical Review
Letters, 124(1), 010508. https://doi.org/10.1103/physrevlett.124.010508
Jia, Y., & Ma, J. (2017). What can machine learning do for seismic data processing? An interpolation application. Geophysics, 82(3), V163–
V177. https://doi.org/10.1190/geo2016-0300.1
Jiang, G. Q., Xu, J., & Wei, J. (2018). A deep learning algorithm of neural network for the parameterization of typhoon-ocean feedback in
typhoon forecast models. Geophysical Research Letters, 45(8), 3706–3716. https://doi.org/10.1002/2018gl077004
Jiang, K., Wang, Z., Yi, P., Wang, G., Lu, T., & Jiang, J. (2019). Edge-enhanced GAN for remote sensing image superresolution. IEEE Trans-
actions on Geoscience and Remote Sensing, 57(8), 5799–5812. https://doi.org/10.1109/tgrs.2019.2902431
Kadow, C., Hall, D. M., & Ulbrich, U. (2020). Artificial intelligence reconstructs missing climate information. Nature Geoscience, 13(6),
408–413. https://doi.org/10.1038/s41561-020-0582-5
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of
the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Lee, S., Ji, E. Y., Moon, Y. J., & Park, E. (2021). One-day forecasting of global TEC using a novel deep learning model. Space Weather, 19(1),
2020SW002600. https://doi.org/10.1029/2020sw002600
Lei, N., An, D., Guo, Y., Su, K., Liu, S., Luo, Z., et al. (2020). A geometric understanding of deep learning. Engineering, 6(3), 361–374.
https://doi.org/10.1016/j.eng.2019.09.010
Lempitsky, V., Vedaldi, A., & Ulyanov, D. (2018). Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern
recognition. (pp. 9446–9454). https://doi.org/10.1109/CVPR.2018.00984
Li, J., Bao, Q., Liu, Y., Wu, G., Wang, L., He, B., et al. (2019). Evaluation of FAMIL2 in simulating the climatology and seasonal-to-inter-
annual variability of tropical cyclone characteristics. Journal of Advances in Modeling Earth Systems, 11(4), 1117–1136. https://doi.
org/10.1029/2018ms001506
Li, L., Lin, Y., Zhang, X., Liang, H., Xiong, W., & Zhan, S. (2019).Convolutional recurrent neural networks based waveform classification
in seismic facies analysis. (pp. 2599–2603). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3215237.1
Li, S., Song, W., Fang, L., Chen, Y., Ghamisi, P., & Benediktsson, J. A. (2019). Deep learning for hyperspectral image classification: An over-
view. IEEE Transactions on Geoscience and Remote Sensing, 57(9), 6690–6709. https://doi.org/10.1109/tgrs.2019.2907932
Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Process-
ing Magazine, 37(3), 50–60. https://doi.org/10.1109/msp.2020.2975749
Li, T., Shen, H., Yuan, Q., Zhang, X., & Zhang, L. (2017). Estimating ground-level PM2.5 by fusing satellite and station observations: A
geo-intelligent deep learning approach. Geophysical Research Letters, 44(23), 11985–11993. https://doi.org/10.1002/2017gl075710
Li, Z., Meier, M. A., Hauksson, E., Zhan, Z., & Andrews, J. (2018). Machine learning seismic wave discrimination: Application to earth-
quake early warning. Geophysical Research Letters, 45(10), 4773–4779. https://doi.org/10.1029/2018gl077870
Liang, J., Ma, J., & Zhang, X. (2014). Seismic data restoration via data-driven tight frame. Geophysics, 79(3), V65–V74. https://doi.
org/10.1190/geo2013-0252.1
Lim, J. S. (2005). Reservoir properties determination using fuzzy logic and neural networks from well data in offshore Korea. Journal of
Petroleum Science and Engineering, 49(3–4), 182–192. https://doi.org/10.1016/j.petrol.2005.05.005
Ling, F., Boyd, D., Ge, Y., Foody, G. M., Li, X., Wang, L., etal. (2019). Measuring river wetted width from remotely sensed imagery at the sub-
pixel scale with a deep convolutional neural network. Water Resources Research, 55(7), 5631–5649. https://doi.org/10.1029/2018wr024136
Linville, L., Pankow, K., & Draelos, T. (2019). Deep learning models augment analyst decisions for event discrimination. Geophysical Re-
search Letters, 46(7), 3643–3651. https://doi.org/10.1029/2018gl081119
Liu, B., Li, X., & Zheng, G. (2019). Coastal inundation mapping from bitemporal and dual-polarization SAR imagery based on deep convo-
lutional neural networks. Journal of Geophysical Research: Oceans, 124(12), 9101–9113. https://doi.org/10.1029/2019jc015577
YU AND MA
10.1029/2021RG000742
33 of 36
Reviews of Geophysics
Liu, L., Zou, S., Yao, Y., & Wang, Z. (2020). Forecasting global ionospheric TEC using deep learning approach. Space Weather, 18(11),
e2020SW002501. https://doi.org/10.1029/2020sw002501
Liu, S. (2020). Multi-parameter full waveform inversions based on recurrent neural networks. (Dissertation for the master degree in science).
Harbin Institute of Technology (China).
Maggiori, E., Tarabalka, Y., Charpiat, G., & Alliez, P. (2017). Convolutional neural networks for large-scale remote-sensing image classifi-
cation. IEEE Transactions on Geoscience and Remote Sensing, 55(2), 645–657. https://doi.org/10.1109/tgrs.2016.2612821
Makhzani, A. (2018). Unsupervised representation learning with autoencoders. (Doctoral dissertation), University of Toronto (Canada).
Malfante, M., Dalla Mura, M., Mars, J. I., Metaxian, J. P., Macedo, O., & Inza, A. (2018). Automatic classification of volcano seismic signa-
tures. Journal of Geophysical Research: Solid Earth, 123(12), 10645–10658. https://doi.org/10.1029/2018jb015470
Mallat, S. (2012). Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. https://doi.
org/10.1002/cpa.21413
Mandelli, S., Borra, F., Lipari, V., Bestagini, P., Sarti, A., & Tubaro, S. (2018). Seismic data interpolation through convolutional autoencoder
(pp. 4101–4105). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2018-2995428.1
Manucharyan, G. E., Siegelman, L., & Klein, P. (2021). A deep learning approach to spatiotemporal sea surface height interpolation and
estimation of deep currents in geostrophic ocean turbulence. Journal of Advances in Modeling Earth Systems, 13(1), e2019MS001965.
https://doi.org/10.1029/2019ms001965
Mcmahan, H. B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. Y. (2017). Communication-efficient learning of deep networks from
decentralized data. In International conference on artificial intelligence and statistics. (pp. 1273–1282). PMLR.
Meier, M. A., Ross, Z. E., Ramachandran, A., Balakrishna, A., Nair, S., Kundzicz, P., etal. (2019). Reliable real-time seismic signal/noise dis-
crimination with machine learning. Journal of Geophysical Research: Solid Earth, 124(1), 788–800. https://doi.org/10.1029/2018jb016661
Mou, L., Ghamisi, P., & Zhu, X. X. (2017). Deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on
Geoscience and Remote Sensing, 55(7), 3639–3655. https://doi.org/10.1109/tgrs.2016.2636241
Mousavi, S. M., & Beroza, G. C. (2020a). A machine-learning approach for earthquake magnitude estimation. Geophysical Research Letters,
47(1), e2019GL085976. https://doi.org/10.1029/2019gl085976
Mousavi, S. M., & Beroza, G. C. (2020b). Bayesian-deep-learning estimation of earthquake location from single-station observations. IEEE
Transactions on Geoscience and Remote Sensing, 58(11), 8211–8224. https://doi.org/10.1109/tgrs.2020.2988770
Mousavi, S. M., Ellsworth, W. L., Zhu, W., Chuang, L. Y., & Beroza, G. C. (2020). Earthquake transformer—An attentive deep-learn-
ing model for simultaneous earthquake detection and phase picking. Nature Communications, 11(1), 1–12. https://doi.org/10.1038/
s41467-020-17591-w
Mousavi, S. M., Horton, S. P., Langston, C. A., & Samei, B. (2016). Seismic features and automatic discrimination of deep and shallow
induced-microearthquakes using neural network and logistic regression. Geophysical Journal International, 207(1), 29–46. https://doi.
org/10.1093/gji/ggw258
Mousavi, S. M., & Langston, C. A. (2016). Hybrid seismic denoising using higher-order statistics and improved wavelet block thresholding.
Bulletin of the Seismological Society of America, 106(4), 1380–1393. https://doi.org/10.1785/0120150345
Mousavi, S. M., & Langston, C. A. (2017). Automatic noise-removal/signal-removal based on general cross-validation thresholding in syn-
chrosqueezed domain and its application on earthquake data. Geophysics, 82(4), V211–V227. https://doi.org/10.1190/geo2016-0433.1
Mousavi, S. M., Langston, C. A., & Horton, S. P. (2016). Automatic microseismic denoising and onset detection using the synchrosqueezed
continuous wavelet transform. Geophysics, 81(4), V341–V355. https://doi.org/10.1190/geo2015-0598.1
Mousavi, S. M., Zhu, W., Ellsworth, W., & Beroza, G. (2019). Unsupervised clustering of seismic signals using deep convolutional autoen-
coders. IEEE Geoscience and Remote Sensing Letters, 16(11), 1693–1697. https://doi.org/10.1109/lgrs.2019.2909218
Mousavi, S. M., Zhu, W., Sheng, Y., & Beroza, G. C. (2019). CRED: A deep residual network of convolutional and recurrent units for earth-
quake signal detection. Scientific Reports, 9(1), 1–14. https://doi.org/10.1038/s41598-019-45748-1
Nazari Siahsar, M. A., Gholtashi, S., Kahoo, A. R., Chen, W., & Chen, Y. (2017). Data-driven multitask sparse dictionary learning for noise
attenuation of 3D seismic data. Geophysics, 82(6), V385–V396. https://doi.org/10.1190/geo2017-0084.1
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., etal. (2020). What role does hydrological science play in
the age of machine learning? Water Resources Research, 57, e2020WR028091. https://doi.org/10.1029/2020WR028091
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In International conference on machine
learning. (pp. 689–696). https://dl.acm.org/doi/10.5555/3104482.3104569
Niu, Y., Wang, Y. D., Mostaghimi, P., Swietojanski, P., & Armstrong, R. T. (2020). An innovative application of generative adversarial net-
works for physically accurate rock images with an unprecedented field of view. Geophysical Research Letters, 47(23), e2020GL089029.
https://doi.org/10.1029/2020gl089029
Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural
networks. In IEEE conference on computer vision and pattern recognition. (pp. 1717–1724). https://doi.org/10.1109/CVPR.2014.222
Oropeza, V., & Sacchi, M. (2011). Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis.
Geophysics, 76(3), V25–V32. https://doi.org/10.1190/1.3552706
Ovcharenko, O., Kazei, V., Kalita, M., Peter, D., & Alkhalifah, T. (2019). Deep learning for low-frequency extrapolation from multioffset
seismic data. Geophysics, 84(6), R989–R1001. https://doi.org/10.1190/geo2018-0884.1
Park, M. J., & Sacchi, M. D. (2019). Automatic velocity analysis using convolutional neural network and transfer learning. Geophysics,
85(1), V33–V43. https://doi.org/10.1190/geo2018-0870.1
Payani, A., Fekri, F., Alregib, G., Mohandes, M., & Deriche, M. (2019). Compression of seismic signals via recurrent neural networks: Lossy
and lossless algorithms. (pp. 4082–4086). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3207380.1
Poulton, M. M. (2002). Neural networks as an intelligence amplification tool: A review of applications. Geophysics, 67(3), 979–993. https://
doi.org/10.1190/1.1484539
Qi, J., Zhang, B., Lyu, B., & Marfurt, K. (2020). Seismic attribute selection for machine-learning-based facies analysis. Geophysics, 85(2),
O17–O35. https://doi.org/10.1190/geo2019-0223.1
Qian, F., Yin, M., Liu, X., Wang, Y., Lu, C., & Hu, G. (2018). Unsupervised seismic facies analysis via deep convolutional autoencoders.
Geophysics, 83(3), A39–A43. https://doi.org/10.1190/geo2017-0524.1
Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward
and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://doi.
org/10.1016/j.jcp.2018.10.045
Ramachandram, D., & Taylor, G. W. (2017). Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing
Magazine, 34(6), 96–108. https://doi.org/10.1109/msp.2017.2738401
YU AND MA
10.1029/2021RG000742
34 of 36
Reviews of Geophysics
Read, J. S., Jia, X., Willard, J., Appling, A. P., Zwart, J. A., Oliver, S. K., etal. (2019). Process-guided deep learning predictions of lake water
temperature. Water Resources Research, 55(11), 9173–9190. https://doi.org/10.1029/2019wr024922
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., & Prabhat (2019). Deep learning and process understand-
ing for data-driven earth system science. Nature, 566(7743), 195–204. https://doi.org/10.1038/s41586-019-0912-1
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image com-
puting and computer assisted intervention (pp. 234–241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28
Ross, Z. E., Meier, M.-A., & Hauksson, E. (2018). P wave arrival picking and first-motion polarity determination with deep learning. Jour-
nal of Geophysical Research: Solid Earth, 123(6), 5120–5129. https://doi.org/10.1029/2017jb015251
Ross, Z. E., Yue, Y. S., Meier, M. A., Hauksson, E., & Heaton, T. H. (2019). Phaselink: A deep learning approach to seismic phase associa-
tion. Journal of Geophysical Research: Solid Earth, 124(1), 856–869. https://doi.org/10.1029/2018jb016674
Rubinstein, R., Zibulevsky, M., & Elad, M. (2010). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE
Transactions on Signal Processing, 58(3), 1553–1564. https://doi.org/10.1109/tsp.2009.2036477
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
https://doi.org/10.1038/323533a0
Rüttgers, M., Lee, S., Jeon, S., & You, D. (2019). Prediction of a typhoon track using a generative adversarial network and satellite images.
Scientific Reports, 9(1), 1–15. https://doi.org/10.1038/s41598-019-42339-y
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in Neural Information Processing Systems.
https://dl.acm.org/doi/10.5555/3294996.3295142
Scher, S., & Messori, G. (2021). Ensemble methods for neural network-based weather forecasts. Journal of Advances in Modeling Earth
Systems, 13(2), e2020MS002331. https://doi.org/10.1029/2020MS002331
Shahnas, M. H., & Pysklywec, R. N. (2020). Toward a unified model for the thermal state of the planetary mantle: Estimations from mean
field deep learning. Earth and Space Science, 7(7), e2019EA000881. https://doi.org/10.1029/2019ea000881
Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources
Research, 54(11), 8558–8593. https://doi.org/10.1029/2018wr022643
Shen, H., Li, T., Yuan, Q., & Zhang, L. (2018). Estimating regional ground-level PM2.5 directly from satellite top-of-atmosphere reflectance
using deep belief networks. Journal of Geophysical Research, 123(24), 13875–13886. https://doi.org/10.1029/2018jd028759
Siahkoohi, A., Louboutin, M., & Herrmann, F. J. (2019). The importance of transfer learning in seismic modeling and imaging. Geophysics,
84(6), A47–A52. https://doi.org/10.1190/geo2019-0056.1
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on
learning representations.
Spitz, S. (1991). Seismic trace interpolation in the f-x domain. Geophysics, 56(6), 785–794. https://doi.org/10.1190/1.1443096
Subedar, M., Krishnan, R., Meyer, P. L., Tickoo, O., & Huang, J. (2019). Uncertainty-aware audiovisual activity recognition using deep Bayes-
ian variational inference. In International conference on computer vision. (pp. 6300–6309). https://doi.org/10.1109/ICCV.2019.00640
Sun, A. Y., Scanlon, B. R., Save, H., & Rateb, A. (2020).Reconstruction of grace total water storage through automated machine learning.
Water Resources Research, 57, e2020WR028666. https://doi.org/10.1029/2020WR028666
Sun, A. Y., Scanlon, B. R., Zhang, Z., Walling, D., Bhanja, S. N., Mukherjee, A., & Zhong, Z. (2019). Combining physically based modeling
and deep learning for fusing grace satellite data: Can we learn from mismatch? Water Resources Research, 55(2), 1179–1195. https://doi.
org/10.1029/2018wr023333
Sun, B., & Alkhalifah, T. (2020). Ml-descent: An optimization algorithm for full-waveform inversion using machine learning. Geophysics,
85(6), R477–R492. https://doi.org/10.1190/geo2019-0641.1
Sun, J., Niu, Z., Innanen, K. A., Li, J., & Trad, D. O. (2020). A theory-guided deep-learning formulation and optimization of seismic wave-
form inversion. Geophysics, 85(2), R87–R99. https://doi.org/10.1190/geo2019-0138.1
Tang, G., Long, D., Behrangi, A., Wang, C., & Hong, Y. (2018). Exploring deep neural networks to retrieve rain and snow in high latitudes
using multisensor and reanalysis data. Water Resources Research, 54(10), 8253–8278. https://doi.org/10.1029/2018wr023830
Tasistro-Hart, A., Grayver, A., & Kuvshinov, A. (2020). Probabilistic geomagnetic storm forecasting via deep learning. Journal of Geophys-
ical Research: Space Physics, 126, e2020JA028228. https://doi.org/10.1029/2020JA028228
Titos, M., Bueno, A., García, L., Benítez, M. C., & Ibañez, J. (2019). Detection and classification of continuous volcano-seismic signals
with recurrent neural networks. IEEE Transactions on Geoscience and Remote Sensing, 57(4), 1936–1948. https://doi.org/10.1109/
tgrs.2018.2870202
Wang, B., Zhang, N., Lu, W., & Wang, J. (2019). Deep-learning-based seismic data interpolation: A preliminary result. Geophysics, 84(1),
V11–V20. https://doi.org/10.1190/geo2017-0495.1
Wang, J., Xiao, Z., Liu, C., Zhao, D., & Yao, Z. (2019). Deep learning for picking seismic arrival times. Journal of Geophysical Research: Solid
Earth, 124(7), 6612–6624. https://doi.org/10.1029/2019jb017536
Wang, J. L., Zhuang, H., Chérubin, L. M., Ibrahim, A. K., & Ali, A. M. (2019). Medium-term forecasting of loop current Eddy Cameron and
Eddy Darwin formation in the Gulf of Mexico with a divide-and-conquer machine learning approach. Journal of Geophysical Research,
124(8), 5586–5606. https://doi.org/10.1029/2019jc015172
Wang, N., Chang, H., & Zhang, D. (2020). Deep-learning-based inverse modeling approaches: A subsurface flow example. Journal of Geo-
physical Research: Solid Earth, 126, e2020JB020549. https://doi.org/10.1029/2020JB020549
Wang, T., Zhang, Z., & Li, Y. (2019). Earthquakegen: Earthquake generator using generative adversarial networks (pp. 2674–2678). SEG
Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3216687.1
Wang, W., & Ma, J. (2020). Velocity model building in a crosswell acquisition geometry with image-trained artificial neural network. Geo-
physics, 85(2), U31–U46. https://doi.org/10.1190/geo2018-0591.1
Wang, W., McMechan, G. A., & Ma, J. (2020). Elastic full-waveform inversion with recurrent neural networks. In SEG technical program
expanded abstracts (pp. 860–864). Society of Exploration Geophysicists. https://doi.org/10.1190/segam2020-3425921.1
Wang, W., McMechan, G. A., Ma, J., & Xie, F. (2021). Automatic velocity picking from semblances with a new deep-learning regression
strategy: Comparison with a classification approach. Geophysics, 86(2), U1–U13. https://doi.org/10.1190/geo2020-0423.1
Wang, Y., Ge, Q., Lu, W., & Yan, X. (2019). Seismic impedance inversion based on cycle-consistent generative adversarial network. (pp. 2498–
2502). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3203757.1
Wang, Y., Wang, B., Tu, N., & Geng, J. (2020). Seismic trace interpolation for irregularly spatial sampled data using convolutional autoen-
coder. Geophysics, 85(2), V119–V130. https://doi.org/10.1190/geo2018-0699.1
Wu, H., Zhang, B., Li, F., & Liu, N. (2019). Semiautomatic first-arrival picking of microseismic events by using the pixel-wise convolutional
image segmentation method. Geophysics, 84(3), V143–V155. https://doi.org/10.1190/geo2018-0389.1
YU AND MA
10.1029/2021RG000742
35 of 36
36 of 36
Reviews of Geophysics
YU AND MA
10.1029/2021RG000742
Wu, H., Zhang, B., Lin, T., Cao, D., & Lou, Y. (2019). Semiautomated seismic horizon interpretation using the encoder-decoder convolu-
tional neural network. Geophysics, 84(6), B403–B417. https://doi.org/10.1190/geo2018-0672.1
Wu, H., Zhang, B., Lin, T., Li, F., & Liu, N. (2019). White noise attenuation of seismic trace by integrating variational mode decomposition
with convolutional neural network. Geophysics, 84(5), V307–V317. https://doi.org/10.1190/geo2018-0635.1
Wu, X., Geng, Z., Shi, Y., Pham, N., Fomel, S., & Caumon, G. (2020). Building realistic structure models to train convolutional neural net-
works for seismic structural interpretation. Geophysics, 85(4), WA27–WA39. https://doi.org/10.1190/geo2019-0375.1
Wu, X., Liang, L., Shi, Y., & Fomel, S. (2019). Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for
3d seismic fault segmentation. Geophysics, 84(3), IM35–IM45. https://doi.org/10.1190/geo2018-0646.1
Wu, X., Shi, Y., Fomel, S., Liang, L., Zhang, Q., & Yusifov, A. Z. (2019). Faultnet3d: Predicting fault probabilities, strikes, and dips with a
single convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 57(11), 9138–9155. https://doi.org/10.1109/
tgrs.2019.2925003
Wu, Y., & McMechan, G. A. (2019). Parametric convolutional neural network-domain full-waveform inversion. Geophysics, 84(6), R881–
R896. https://doi.org/10.1190/geo2018-0224.1
Xiao, C., Deng, Y., & Wang, G. (2021). Deep-learning-based adjoint state method: Methodology and preliminary application to inverse
modelling. Water Resources Research, 57(2), e2020WR027400. https://doi.org/10.1029/2020wr027400
Yamaga, N., & Mitsui, Y. (2019). Machine learning approach to characterize the postseismic deformation of the 2011 Tohoku-Oki earth-
quake based on recurrent neural network. Geophysical Research Letters, 46(21), 11886–11892. https://doi.org/10.1029/2019gl084578
Yang, F., & Ma, J. (2019). Deep-learning inversion: A next-generation seismic velocity model building method. Geophysics, 84(4), R583–
R599. https://doi.org/10.1190/geo2018-0249.1
Yang, Q., Tao, D., Han, D., & Liang, J. (2019). Extracting auroral key local structures from all-sky auroral image by artificial intelligence
technique. Journal of Geophysical Research: Space Physics, 124(5), 3512–3521. https://doi.org/10.1029/2018ja026119
Yoo, D., & Kweon, I. S. (2019). Learning loss for active learning. In IEEE conference on computer vision and pattern recognition. (pp. 93–102).
https://doi.org/10.1109/CVPR.2019.00018
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks. In Proceedings of the neural
information processing systems. (pp. 3320–3328). https://dl.acm.org/doi/10.5555/2969033.2969197
You, N., Li, Y. E., & Cheng, A. (2020). Shale anisotropy model building based on deep neural networks. Journal of Geophysical Research:
Solid Earth, 125(2), e2019JB019042. https://doi.org/10.1029/2019jb019042
Yu, S., Ma, J., & Osher, S. (2016). Monte Carlo data-driven tight frame for seismic data recovery. Geophysics, 81(4), V327–V340. https://doi.
org/10.1190/geo2015-0343.1
Yu, S., Ma, J., & Wang, W. (2019). Deep learning for denoising. Geophysics, 84(6), V333–V350. https://doi.org/10.1190/geo2018-0668.1
Yu, S., Ma, J., Zhang, X., & Sacchi, M. (2015). Interpolation and denoising of high-dimensional seismic data by learning a tight frame.
Geophysics, 80(5), V119–V132. https://doi.org/10.1190/geo2014-0396.1
Yuan, P., Wang, S., Hu, W., Wu, X., Chen, J., & Van Nguyen, H. (2020). A robust first-arrival picking workflow using convolutional and
recurrent neural networks. Geophysics, 85(5), U109–U119. https://doi.org/10.1190/geo2019-0437.1
Zhang, C., Frogner, C., Araya-Polo, M., & Hohl, D. (2014). Machine-learning based automated fault detection in seismic traces. In 76th
EAGE conference and exhibition. (pp.1–5). https://doi.org/10.3997/2214-4609.20141500
Zhang, H., Yang, X., & Ma, J. (2020). Can learning from natural image denoising be used for seismic data interpolation?. Geophysics, 85(4),
WA115–WA136. https://doi.org/10.1190/geo2019-0243.1
Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a Gaussian denoiser: Residual learning of deep CNN for image denois-
ing. IEEE Transactions on Image Processing, 26(7), 3142–3155. https://doi.org/10.1109/tip.2017.2662206
Zhang, K., Zuo, W., Gu, S., & Zhang, L. (2017). Learning deep CNN denoiser prior for image restoration. In IEEE conference on computer
vision and pattern recognition. (pp. 2808–2817).
Zhang, X., Zhang, J., Yuan, C., Liu, S., Chen, Z., & Li, W. (2020). Locating induced earthquakes with a network of seismic stations in Okla-
homa via a deep learning method. Scientific Reports, 10(1), 1941. https://doi.org/10.1038/s41598-020-58908-5
Zhang, Z., & Alkhalifah, T. (2019). Regularized elastic full-waveform inversion using deep learning. Geophysics, 84(5), R741–R751. https://
doi.org/10.1190/geo2018-0685.1
Zhang, Z., Stanev, E. V., & Grayek, S. (2020). Reconstruction of the basin-wide sea-level variability in the north sea using coastal data and gen-
erative adversarial networks. Journal of Geophysical Research: Oceans, 125(12), e2020JC016402. https://doi.org/10.1029/2020jc016402
Zhang, Z., Wang, H., Xu, F., & Jin, Y. (2017). Complex-valued convolutional neural network and its application in polarimetric SAR image
classification. IEEE Transactions on Geoscience and Remote Sensing, 55(12), 7177–7188. https://doi.org/10.1109/tgrs.2017.2743222
Zhao, M., Chen, S., Fang, L., & David, A. Y. (2019). Earthquake phase arrival auto-picking based on u-shaped convolutional neural net-
work. Chinese Journal of Geophysics, 62(8), 3034–3042. https://doi.org/10.6038/cjg2019M0495
Zhong, Y., Ye, R., Liu, T., Hu, Z., & Zhang, L. (2020). Automatic aurora image classification framework based on deep learning for occur-
rence distribution analysis: A case study of all-sky image datasets from the yellow river station. Journal of Geophysical Research, 125,
e2019JA027590. https://doi.org/10.1029/2019JA027590
Zhou, Y., Yue, H., Kong, Q., & Zhou, S. (2019). Hybrid event detection and phase-picking algorithm using convolutional and recurrent
neural networks. Seismological Research Letters, 90(3), 1079–1087. https://doi.org/10.1785/0220180319
Zhu, J., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Inter-
national conference on computer vision. (pp. 2242–2251). https://doi.org/10.1109/ICCV.2017.244
Zhu, W., Mousavi, S. M., & Beroza, G. C. (2019). Seismic signal denoising and decomposition using deep neural networks. IEEE Transac-
tions on Geoscience and Remote Sensing, 57(11), 9476–9488. https://doi.org/10.1109/tgrs.2019.2926772
... Thus, the lack of a robust evaluation method for the statistical significance of possible precursors is not acceptable. The rapid development of AI and machine learning methods presents an excellent opportunity to enhance the quality of data sharing, processing, and analysis of the resistivity data observed in seismic active areas [69,70]. ...
Article
Full-text available
In this paper, a critical review of the geoelectrical monitoring activities carried out in seismically active areas is presented and discussed. The electrical resistivity of rocks is one of the geophysical parameters of greatest interest in the study of possible seismic precursors, and it is strongly influenced by the presence of highly fractured zones with high permeability and fluid levels. The analysis in the present study was carried out on results obtained over the last 50 years in seismic zones in China, Japan, the USA and Russia. These past works made it possible to classify the different monitoring strategies, analyze the theoretical models for interpreting possible correlations between anomalies in resistivity signals and local seismicity, and identify the main scientific and technological gaps in the literature. In addition, great attention has been paid to some recent works on the study of the correlations between focal mechanisms and the shapes of anomalous patterns in resistivity time series. Finally, some future scenarios for the development of new activities in this field have been identified.
... They have successfully been applied in seismic interpolation [37], full waveform inversion [43], impedance inversion [16], and seismic interpretation [39]. See also [27,40] for an overview. In many of these cases neural networks generate the best outcome by far. ...
Preprint
Full-text available
Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Laplacian and show its application in acoustic impedance inversion which is a routine procedure in seismic explorations. A neural network is used to obtain a first approximation of the underlying acoustic impedance and construct a graph Laplacian matrix from this approximation. Afterwards, we use a Tikhonov-like variational method to solve the impedance inversion problem where the regularizer is based on the constructed graph Laplacian. The obtained solution can be shown to be more accurate and stable with respect to noise than the initial guess obtained by the neural network. This process can be iterated several times, each time constructing a new graph Laplacian matrix from the most recent reconstruction. The method converges after only a few iterations returning a much more accurate reconstruction. We demonstrate the potential of our method on two different datasets and under various levels of noise. We use two different neural networks that have been introduced in previous works. The experiments show that our approach improves the reconstruction quality in the presence of noise.
... In the past few decades, Deep Learning (DL) methods have become more and more fashionable in earth science, including hydrology Yuan et al., 2020;Yu et al., 2021). Compared with the process-based hydrological models, which often require a high amount of input data (e.g. ...
Article
Full-text available
Study Region: In the Yangtze River basin of China. Study focus: The emerging Explainable Artificial Intelligence (XAI) methods provide us an opportunity to understand the nonlinear relationship that the Deep Learning(DL) model learned inside. The construction of the Three Gorges Dam (TGD) has successfully minimized the likelihood of flooding in the Yangtze River basin. The XAI methods can help us to know the nonlinear relationship behind it. We apply the Long Short Term Memory (LSTM) network, in conjunction with two XAI methods, SHapley Additive exPlanation (SHAP) and Expected Gradient (EG), to do our work.In our DL model, we use YiChang (YC) station runoff,Precipitation (Pre) and vapour pressure deficit (VPD) data from the middle and lower river basin as input, while the output of the model generates runoff data at the DaTong (DT) station, XAI methods enable us to calculate the significance of each input feature is for generating the output feature in a DL model. In this study, we examine the difference in importance scores between the Before Three Gorges Dam (BTGD) period and the After Three Gorges Dam (ATGD) period. New Hydrological Insights for the Region: In the BTGD period, YC runoff was the primary contributor to flooding at the DT station. However, in the ATGD period, the largest contribution to flooding in the middle and lower Yangtze River basin has shifted from YC runoff to the the middle and lower reaches of precipitation. Our results suggest that the XAI can show the nonlinear relationship between the TGD and downstream flood clearly and the TGD can effectively mitigate flooding in the middle and lower river basins by regulating runoff from the upper river basin. The work shows the potential of XAI to explain the nonlinear relationship in the hydrology field.
... Bahkan dengan kemajuan teknologi yang sangat pesat, pendekatan-pendekatan lain muncul dengan tujuan untuk menghasilkan model untuk interpretasi yang optimal. Munculnya teknologi seperti Artificial Intelligence (AI) (FitzGerald, 2019), Deep Learning (Yu & Ma, 2021), Neural Network (Van der Baan & Jutten, 2000) dan lainnya membuat kebutuhan akan sistem komputer cepat semakin bertambah. ...
Article
Full-text available
Geophysical methods generally require one or several processes before interpretation is carried out. This process usually requires software that requires fast computer technology. The better the computer device, the faster data can be processed. We propose a new approach in processing and interpreting and integrating geophysical data, especially resistivity data, using cloud technology. This technology is generally able to increase the speed of processing and interpreting geophysical data, which really requires devices with fast capabilities. Not to mention, if there is a lot of data being processed, it will take a long time just to process the data. Therefore, by using cloud technology the work can be done efficiently because it uses computers with modern and fast technology. In this research we apply this technology to geophysical data that is most often used for shallow exploration, namely the resistivity geoelectric method. With this research, we hope that data processing and geophysical data inversion will be more efficient and effective and the data will be safer.
... In various fields of science and engineering, such as sparse optimization [1], machine learning [2], image processing [3] and geophysics [4], one aims to find the sparsest signal within an underdeterminant linear system of = . This problem is commonly known as compressed sensing (CS) [5][6][7][8], where the signal x is expected to exhibit sparsity or compressibility. ...
Article
Full-text available
In this paper, we propose a novel variant of the alternating direction method of multipliers (ADMM) approach for solving minimization of the rate of ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} and ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{2}$$\end{document} norms for sparse recovery. We first transform the quotient of ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} and ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{2}$$\end{document} norms into a new function of the separable variables using the least squares minimum norm solution of the linear system of equations. Subsequently, we employ the augmented Lagrangian function to formulate the corresponding ADMM method with a dynamically adjustable parameter. Additionally, each of its subproblems possesses a unique global minimum. Finally, we present some numerical experiments to demonstrate our results.
... require specialized knowledge and expertise from geological experts. In recent years, with the advancements in deep learning techniques [3], an increasing number of researchers have started integrating these techniques into the interpretation of seismic images [4]- [6]. By leveraging deep learning methods, researchers aim to improve the efficiency and accuracy of seismic stratigraphy interpretation, allowing for more reliable and expedited analysis of subsurface structures. ...
Article
Full-text available
With the continuous development of computer technology and significant improvements in computing power, deep learning has found increasing applications in seismic stratigraphy interpretation, showcasing notable advancements over traditional methods. However, due to the unique characteristics of seismic data, labeling such data has become extremely challenging and time-consuming, necessitating the involvement of professional geologists. Consequently, few-shot learning has garnered considerable attention for seismic image segmentation.Nevertheless, two key challenges remain in few-shot learning: selecting more representative samples and validating the model during training. As the availability of labeled samples decreases, we are left with inadequate data for the validation set. In this paper, instead of solely focusing on enhancing the network structure, we propose the utilization of Spectral Clustering Sampling (SCS) methods for training data selection. Additionally, we introduce a metric called Sum of Different (SD), which can be computed without the need for labeled data, to replace the conventional validation set loss employed in traditional validation approaches. Notably, By employing SCS methods for training data selection and introducing the SD metric to replace traditional validation set loss in F3 dataset, we have achieved remarkable outcomes.
Article
Fracture weaknesses represent two critical elastic parameters utilized for characterizing fracture properties in naturally fractured reservoirs. Given the intricate seismic attributes associated with amplitude variation with angles of incidence and azimuth (AVAZ) in fractured reservoir, accurately delineating the mapping relationships between azimuthal seismic data and fracture weaknesses in analytic form poses a significant challenge. Leveraging neural networks offers a nonlinear mechanism to bridge this gap. Initially, we establish a forward model by employing convolution operations between the azimuthal PP-wave (incident and reflected P-wave) reflection coefficient equation in transversely isotropic (HTI) media with a horizontal axis of symmetry and seismic statistical wavelets. This foundation enables the synthesis of azimuth-dependent seismic data. Subsequently, a convolution neural network (CNN) is constructed to predict subsurface rock fracture properties from azimuthal pre-stack seismic data. To quantify the uncertainty associated with neural network estimation, we employ the approximate Bayesian computation (ABC) method to determine the posterior distribution of model parameters. Finally, we present the application of both synthetic and filed data. Our results indicate a correlation of 90% and 86.8% between the synthetic model and the blind well, respectively. Furthermore, the estimated posterior distribution serves to validate the constraint capability of the proposed method, thereby furnishing comprehensive evidence supporting the feasibility and robustness of our approach.
Article
Full-text available
Density is an important parameter for both geological research and geophysical exploration. However, for model-driven seismic inversion methods, high-fidelity density inversion is challenging due to seismic wave travel-time insensitivity to density, and crosstalk that density has with velocity. To circumvent the challenge of density inversion, some inversion methods treat density as a constant value or derive density from velocity through empirical equation. On the other hand, deep learning approaches are completely driven by data and have strong target-oriented characteristics, offering a new way to solve multi-parameter coupling problems. Nevertheless, the accuracy of the inversion results of data-driven algorithms is directly related to the amount and diversity of the training data, and thus, they lack the universality of model-driven algorithms. To achieve accurate density inversion, we propose a simultaneous inversion algorithm for velocity and density that combines the advantages of data- and model- driven approaches: A neural network model (U-T), combining the U-net and Transformer architectures, is proposed to construct nonlinear mappings between seismic data as inputs and the velocity and density as predictions. Next, the model-driven inversion algorithm uses the U-T prediction as the initial model to obtain the final accurate solution. In the model-driven module, envelope-based sparse constrained deconvolution is used to obtain full-band seismic data, while a variable dominant frequency full waveform inversion algorithm is employed to perform multi-scale inversion, ultimately leading to accurate inversion results of velocity and density. The performance of the algorithm on the Sigsbee2A and Marmousi models demonstrates its effectiveness.
Article
The ultra-low-frequency (ULF) electromagnetic anomaly has been considered one of the earthquake precursory signals with the potential for short-term prediction. As such, effectively detecting ULF anomalies is important for mitigating earthquake disasters. Given the comprehensive coverage of geomagnetic networks in Japan, the 2011 Tohoku-Oki earthquake (M9.0) provided a tremendous opportunity for investigating the characteristics and mechanisms of ULF anomalies. Previous studies reporting detections of ULF anomalies triggered by the Tohoku-Oki earthquake have been questioned on the grounds of insufficient reliability due to technological limitations. In this study, we employ a multi-reference station data-quality-weighted method to detect the ULF anomaly. Comparison with traditional single-reference station method demonstrates the robustness of our detection results. Statistical test also indicates that the ULF anomaly appearing approximately two months before the earthquake was driven by physical processes. Geomagnetic storm analysis further rules solar activity as the cause of the anomaly. Spatial distribution of the anomaly amplitude reveals a decrease in the anomaly energy with increasing epicentral distance, implying a strong association between the anomaly and the earthquake. Our study is helpful for understanding the possible connection between the ULF electromagnetic precursors and the seismogenic processes of strong earthquakes, which contributes to achieving the ultimate goal of short-term earthquake prediction.
Article
Full-text available
We present an efficient adjoint model based on the deep-learning surrogate to address high-dimensional inverse modelling with an application to subsurface transport. The proposed method provides a completely code non-intrusive and computationally feasible way to approximate the model derivatives, which subsequently can be used to derive gradients for inverse modelling. This conceptual deep-learning framework, i.e., an architecture of deep convolutional neural network through combining autoencoder and autoregressive structure, efficiently produces an analogously analytical adjoint with the help of auto-differentiation (AD) module in the popular deep-learning packages. We intentionally retain training data at the specific time instances where the measurements are taken, the storage of the intermediate states and computation of their adjoint, therefore, are completely avoided. This proposed adjoint state method is tested on a synthetic 2D model for parameter estimations. The preliminary results reveal the feasibility of the proposed adjoint state method in term of computational efficiency and programming flexibility.
Article
Full-text available
Abstract Ensemble weather forecasts enable a measure of uncertainty to be attached to each forecast, by computing the ensemble's spread. However, generating an ensemble with a good spread‐error relationship is far from trivial, and a wide range of approaches to achieve this have been explored—chiefly in the context of numerical weather prediction models. Here, we aim to transform a deterministic neural network weather forecasting system into an ensemble forecasting system. We test four methods to generate the ensemble: random initial perturbations, retraining of the neural network, use of random dropout in the network, and the creation of initial perturbations with singular vector decomposition. The latter method is widely used in numerical weather prediction models, but is yet to be tested on neural networks. The ensemble mean forecasts obtained from these four approaches all beat the unperturbed neural network forecasts, with the retraining method yielding the highest improvement. However, the skill of the neural network forecasts is systematically lower than that of state‐of‐the‐art numerical weather prediction models.
Article
Full-text available
Plain Language Summary Geomagnetic storms are capable of damaging infrastructures like power grids and communication lines, motivating our need to forecast them. Solar phenomena produce geomagnetic storms, which occur when these phenomena reach Earth as bursts of the solar wind. Decades of satellite observations of both the solar wind near the Earth and of the Sun itself are promising for forecasting geomagnetic storms with algorithms known as neural networks. Several neural network architectures have been applied to geomagnetic storm forecasting, but their full potential remains unexplored. First, all existing neural networks have used measurements of the solar wind one hour upstream of the Earth or closer. While these observations are critical for understanding geomagnetic storm progression, from them it is nearly impossible to forecast more than an hour in advance. We include observations of the Sun itself, which reach Earth much faster than the solar wind, thereby including information for forecasting further in advance. Second, all existing neural networks have generated forecasts without uncertainty estimates, meaning that end‐users (such as utilities or telecommunications companies) know little about forecast confidence. We present an architecture that generates estimates of uncertainty, and our results demonstrate that neural networks learn how confident to be in their forecasts.
Article
Full-text available
We present an application of generative adversarial networks (GANs) to reconstruct the sea level of the North Sea using a limited amount of data from tidal gauges (TGs). The application of this technique, which learns how to generate datasets with the same statistics as the training set, is explained in detail to ensure that interested scientists can implement it in similar or different oceanographic cases. Training is performed for all of 2016, and the model is validated on data from 3 months in 2017 and compared against reconstructions using the Kalman filter approach. Tests with datasets generated by an operational model (“true data”) demonstrated that using data from only 19 locations where TGs permanently operate is sufficient to generate an adequate reconstruction of the sea surface height (SSH) in the entire North Sea. The machine learning approach appeared successful when learning from different sources, which enabled us to feed the network with real observations from TGs and produce high‐quality reconstructions of the basin‐wide SSH. Individual reconstruction experiments using different combinations of training and target data during the training and validation process demonstrated similarities with data assimilation when errors in the data and model were not handled appropriately. The proposed method demonstrated good skill when analyzing both the full signal and the low‐frequency variability only. It was demonstrated that GANs are also skillful at learning and replicating processes with multiple time scales. The different skills in different areas of the North Sea are explained by the different signal‐to‐noise ratios associated with differences in regional dynamics.
Article
Full-text available
Abstract Model error is one of the main obstacles to improved accuracy and reliability in numerical weather prediction (NWP) and climate prediction conducted with state‐of‐the‐art, comprehensive high‐resolution general circulation models. In a data assimilation framework, recent advances in the context of weak‐constraint 4D‐Var have shown that it is possible to estimate and correct for a large fraction of systematic model error which develops in the stratosphere over short forecast ranges. The recent explosion of interest in machine learning/deep learning technologies has been driven by their remarkable success in disparate application areas. This raises the question of whether model error estimation and correction in operational NWP and climate prediction can also benefit from these techniques. In this work, we aim to start to give an answer to this question. Specifically, we show that artificial neural networks (ANNs) can reproduce the main results obtained with weak‐constraint 4D‐Var in the operational configuration of the IFS model of the European Centre for Medium‐Range Weather Forecasts (ECMWF). We show that the use of ANN models inside the weak‐constraint 4D‐Var framework has the potential to extend the applicability of the weak‐constraint methodology for model error correction to the whole atmospheric column. Finally, we discuss the potential and limitations of the machine learning/deep learning technologies in the core NWP tasks. In particular, we reconsider the fundamental constraints of a purely data‐driven approach to forecasting and provide a view on how to best integrate machine learning technologies within current data assimilation and forecasting methods.
Article
Full-text available
Recently, recurrent deep networks have shown promise to harness newly available satellite‐sensed data for long‐term soil moisture projections. However, to be useful in forecasting, deep networks must also provide uncertainty estimates. Here we evaluated Monte Carlo dropout with an input‐dependent data noise term (MCD+N), an efficient uncertainty estimation framework originally developed in computer vision, for hydrologic time series predictions. MCD+N simultaneously estimates a heteroscedastic input‐dependent data noise term (a trained error model attributable to observational noise) and a network weight uncertainty term (attributable to insufficiently constrained model parameters). Although MCD+N has appealing features, many heuristic approximations were employed during its derivation, and rigorous evaluations and evidence of its asserted capability to detect dissimilarity were lacking. To address this, we provided an in‐depth evaluation of the scheme's potential and limitations. We showed that for reproducing soil moisture dynamics recorded by the Soil Moisture Active Passive (SMAP) mission, MCD+N indeed gave a good estimate of predictive error, provided that we tuned a hyperparameter and used a representative training data set. The input‐dependent term responded strongly to observational noise, while the model term clearly acted as a detector for physiographic dissimilarity from the training data, behaving as intended. However, when the training and test data were characteristically different, the input‐dependent term could be misled, undermining its reliability. Additionally, due to the data‐driven nature of the model, data noise also influences network weight uncertainty, and therefore the two uncertainty terms are correlated. Overall, this approach has promise, but care is needed to interpret the results.
Article
Full-text available
In this study, we make a global total electron content (TEC) forecasting using a novel deep learning method, which is based on conditional generative adversarial networks. For training, we use the International GNSS Service (IGS) TEC maps from 2003 to 2012 with 2‐h time cadence. Our model has two input images (IGS TEC map and 1‐day difference map between the present day and the previous day) and one output image (1‐day future map). The model is tested with two data sets: solar maximum period (2013–2014) and solar minimum period (2017–2018). Then, we compare the results of our model with those of 1‐day Center for Orbit Determination in Europe (CODE) prediction model. Our major results can be summarized as follows. First, we successfully apply our model to the forecast of global TEC maps. Second, our model well predicts daily TEC maps with 1 day in advance using only previous TEC maps. The averaged root mean square error, bias, and standard deviation between AI‐generated and IGS TEC maps are 2.74 TECU, −0.32 TECU, and 2.59 TECU, respectively. Third, our model generates some peak structures around equatorial regions. Fourth, our model shows better performance than 1‐day CODE prediction model during both solar maximum and minimum periods. Fifth, another model with additional input data Kp index gives a slight improvement of the results. Our study shows that our deep learning model based on an image translation method will be effective for forecasting of future images using previous data.
Article
Full-text available
Abstract Satellite altimeters provide global observations of sea surface height (SSH) and present a unique data set for advancing our theoretical understanding of upper‐ocean dynamics and monitoring its variability. Considering that mesoscale SSH patterns can evolve on timescales comparable to or shorter than satellite return periods, it is challenging to accurately reconstruct the continuous SSH evolution as currently available altimetry observations are still spatially and temporally sparse. Here we explore the possibility of SSH interpolation via Deep Learning by using synthetic observations from an idealized quasigeostrophic model of baroclinic ocean turbulence. We demonstrate that Convolutional Neural Networks with Residual Learning are superior in SSH reconstruction to linear and recently developed dynamical interpolation techniques. Also, the deep neural networks can provide a skillful state estimate of unobserved deep ocean currents at mesoscales. These conspicuous results suggest that SSH patterns of eddies might contain substantial information about the underlying deep ocean currents that are necessary for SSH prediction. Our training data are focused on highly idealized physics and diversification of processes needs to be considered to more accurately represent the real ocean. In addition, methodological improvements such as transfer learning and implementation of dynamically aware loss functions might be necessary to consider before its ultimate use with real satellite observations. Nonetheless, by providing a proof of concept based on synthetic data, our results point to deep learning as a viable alternative to existing interpolation and, more generally, state estimation methods for satellite observations of eddying currents.
Article
Deep-learning has achieved good performance and demonstrated great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning-based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for problems with uncertain model parameters. By incorporating physical laws and other constraints, the TgNN surrogate can be constructed with limited simulation runs and accelerate the inversion process significantly. Three TgNN surrogate-based inversion methods are proposed, including the gradient method, the Iterative Ensemble Smoother method, and the training method. The second category is direct-deep-learning-inversion methods, in which TgNN constrained with geostatistical information, named TgNN-geo, is proposed as the deep-learning framework for direct inverse modeling. In TgNN-geo, two neural networks are introduced to approximate the random model parameters and the solution, respectively. In order to honor prior geostatistical information of the random model parameters, the neural network for approximating the random model parameters is first trained by using observed or generated realizations. Then, by minimizing the loss function of TgNN-geo, the estimation of model parameters and the approximation of the model solution can be simultaneously obtained. Since the prior geostatistical information can be incorporated, the direct-inversion method based on TgNN-geo works well, even in cases with sparse spatial measurements or imprecise prior statistics. Although the proposed deep-learning-based inverse modeling methods are general in nature, and thus applicable to a wide variety of problems, they are tested with several subsurface flow problems. It is found that satisfactory results are obtained with high efficiency. Moreover, both the advantages and disadvantages are further analyzed for the proposed two categories of deep-learning-based inversion methods.