ArticlePDF Available

Deep Learning for Geophysics: Current and Future Trends

Reviews of Geophysics

July 2021
59(3)

DOI:10.1029/2021RG000742

License
CC BY 4.0

Authors:

Siwei Yu

Harbin Institute of Technology

Jianwei Ma

Peking University

Recently deep learning (DL), as a new data-driven technique compared to conventional approaches, has attracted increasing attention in geophysical community, resulting in many opportunities and challenges. DL was proven to have the potential to predict complex system states accurately and relieve the “curse of dimensionality” in large temporal and spatial geophysical applications. We address the basic concepts, state-of-the-art literature, and future trends by reviewing DL approaches in various geosciences scenarios. Exploration geophysics, earthquakes, and remote sensing are the main focuses. More applications, including Earth structure, water resources, atmospheric science, and space science, are also reviewed. Additionally, the difficulties of applying DL in the geophysical community are discussed. The trends of DL in geophysics in recent years are analyzed. Several promising directions are provided for future research involving DL in geophysics, such as unsupervised learning, transfer learning, multimodal DL, federated learning, uncertainty estimation, and active learning. A coding tutorial and a summary of tips for rapidly exploring DL are presented for beginners and interested readers of geophysics.

An illustration of model‐driven and data‐driven methods. On the left are the research topics in geophysics ranging from the Earth's core to the outer space. On the right is the observation means used at present. In the middle are examples of model‐driven and data‐driven methods. In model‐driven methods, the principles of geophysical phenomena are induced from a large amount of observed data based on physical causality, then the models are used to deduct the geophysical phenomena in the future or in the past. In data‐driven methods, the computer first inducts a regression or classification model without considering physical causality. Then, this model will perform tasks such as classification on incoming datasets.

…

The containment relationship among artificial intelligence, machine learning, neural network and deep learning, and the classification of deep learning approaches.

…

(a) and (b) are statics of artificial intelligence (AI)‐related papers in SEG Library and AGU Library. In (a), Geophysics means the flagship journal of SEG. SEG Expanded Abstracts means the Expanded Abstracts from SEG annual meeting. SEG Library papers mean the papers founded in the SEG digital library. In (b), the first three captions in the legend are the names of top journals in AGU. The fourth caption in the legend represents the papers founded in the AGU digital library.

…

The topics included in this review. (a) Deep learning (DL)‐based geophysical applications. (b) The future trends of applying DL in geophysics.

…

+22

An illustration of dictionary learning: data‐driven tight frame. The dictionary is initialized with a spline framelet. After training based on a post‐stack seismic data set, the trained dictionary exhibits apparent structures.

…

Figures - available from: Reviews of Geophysics

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Wiley.

Learn more

Content available from Reviews of Geophysics

This content is subject to copyright. Terms and conditions apply.

1. Introduction

Geophysics is a discipline that uses physical principles and methods to investigate and characterize the

Earth, from the Earth's core to the Earth's surface. Modern geophysics extends to outer space, from the outer

layers of the Earth's atmosphere to other planets. The general methods of geophysics consist of data obser-

vation, processing, modeling, and prediction. Observation is an essential means by which humans come

to understand unknown geophysical phenomena. Data observation uses mainly noninvasive techniques

such as seismic waves, gravity fields, and remote sensing. Data processing techniques, including denoising

and reconstruction, retrieve useful information from raw observations. Mathematical modeling based on

physical laws helps to characterize geophysical phenomena. Predictions provide the unknown based on the

known data and models. Spatial predictions are used to uncover the Earth's interior, such as in exploration

geophysics, which images the physical properties of the subsurface. Temporal predictions provide the his-

torical or future states of the Earth, such as in weather forecasting.

With the advance of acquisition equipment, the amount of geophysical observed data is increasing at an

impressive speed. How to utilize such a big amount of data for processing, modeling and prediction is a

significant problem. It could be helpful to solve part of the bottlenecks in traditional geophysical methods.

Taking modeling as an example, one of the most challenging tasks in modeling is to characterize the Earth

with a high resolution. However, there is a contradiction in traditional methods that prevents the simulta-

neous achievement of both a high resolution and a wide range of data observation due to hardware limi-

tations. Therefore, it is nearly impossible to obtain a high resolution model of the Earth, either spatially or

temporally, since the Earth has an extremely large spatial and temporal scale. An Earth system numerical

simulation facility in China, called EarthLab (Li, Bao, etal.,2019), can at most provide a resolution of 25km

Abstract Recently deep learning (DL), as a new data-driven technique compared to conventional

approaches, has attracted increasing attention in geophysical community, resulting in many opportunities

and challenges. DL was proven to have the potential to predict complex system states accurately and

relieve the “curse of dimensionality” in large temporal and spatial geophysical applications. We address

the basic concepts, state-of-the-art literature, and future trends by reviewing DL approaches in various

geosciences scenarios. Exploration geophysics, earthquakes, and remote sensing are the main focuses.

More applications, including Earth structure, water resources, atmospheric science, and space science, are

also reviewed. Additionally, the difficulties of applying DL in the geophysical community are discussed.

The trends of DL in geophysics in recent years are analyzed. Several promising directions are provided for

future research involving DL in geophysics, such as unsupervised learning, transfer learning, multimodal

DL, federated learning, uncertainty estimation, and active learning. A coding tutorial and a summary of

tips for rapidly exploring DL are presented for beginners and interested readers of geophysics.

Plain Language Summary With the rapid development of artificial intelligence (AI),

students and researchers in the geophysical community would like to know what AI can bring to

geophysical discoveries. We present a review of deep learning (DL), a popular AI technique, for

geophysical readers to understand recent advances, open problems, and future trends. This review aims

to pave the way for more geophysical researchers, students, and teachers to understand and use DL

techniques.

YU AND MA

This is an open access article under

the terms of the Creative Commons

Attribution License, which permits use,

distribution and reproduction in any

medium, provided the original work is

properly cited.

Deep Learning for Geophysics: Current and Future

Trends

Siwei Yu1 and Jianwei Ma2

1School of Mathematics, Institute of Artificial Intelligence, Harbin Institute of Technology, Harbin, China, 2School of

Earth and Space Sciences, Center of Artificial Intelligence Geosciences, Peking University, Beijing, China

Key Points:

• The concept of deep learning (DL)

and classical architectures of deep

neural networks are introduced

• A review of state-of-the-art DL

methods in geophysical applications

is provided

• The future directions for developing

new DL methods in geophysics are

discussed

Correspondence to:

J. Ma,

jwm@pku.edu.cn

Citation:

Yu, S., & Ma, J. (2021). Deep learning

for geophysics: Current and future

trends. Reviews of Geophysics,

59, e2021RG000742. https://doi.

org/10.1029/2021RG000742

Received 11 MAR 2021

Accepted 25 MAY 2021

10.1029/2021RG000742

REVIEW ARTICLE

1 of 36

Reviews of Geophysics

YU AND MA

10.1029/2021RG000742

2 of 36

Table 1

Examples of Data-Driven Tasks in Geophysics

Reviews of Geophysics

for the atmosphere and 10km for oceans based on a high-performance computation device with 15 P FLOPs

(floating-point operations per second). Several specific difficult tasks in geophysics are listed in Table1.

To illustrate the bottlenecks in processing and prediction, we use exploration geophysics as an example. Ex-

ploration geophysics aims to observe Earth's subsurface or other planets

with data collected at the surface, such as seismic fields and gravity fields.

The main process of exploration geophysics includes pre-processing and

imaging, where imaging means predicting the subsurface structures. In

the geophysical signal pre-processing stage, the simplest assumption re-

garding the shapes of underground layers is that the reflective seismic

records are linear in small windows (Spitz,1991). The sparsity assump-

tion presumes that the data are sparse under certain transforms (Donoho

& Johnstone,1995), such as the curvelet domain (Herrmann & Henn-

enfent, 2008) or other time-frequency domains (Mousavi, Langston,

etal.,2016; Mousavi & Langston2016,2017). The low-rank assumption

supposes that the data are low-rank after the Hankel transform (Oropeza

& Sacchi,2011). However, the predesigned linear assumption or sparse

transform assumption is not adaptive to different types of seismic data

and may lead to low denoising or interpolation quality for data with com-

plex structures. In the geophysical imaging stage, wave equations are fun-

damental tools to govern the kinematics and dynamics of seismic wave

propagation. Acoustic, elastic, or viscoelastic wave equations introduce

an increasing number of factors into the wave equations, and the gener-

ated wavefield records can precisely estimate real scenarios. However, as

YU AND MA

10.1029/2021RG000742

3 of 36

Figure 1. An illustration of model-driven and data-driven methods. On the left are the research topics in geophysics ranging from the Earth's core to the outer

space. On the right is the observation means used at present. In the middle are examples of model-driven and data-driven methods. In model-driven methods,

the principles of geophysical phenomena are induced from a large amount of observed data based on physical causality, then the models are used to deduct the

geophysical phenomena in the future or in the past. In data-driven methods, the computer first inducts a regression or classification model without considering

physical causality. Then, this model will perform tasks such as classification on incoming datasets.

Figure 2. The containment relationship among artificial intelligence,

machine learning, neural network and deep learning, and the classification

of deep learning approaches.

Reviews of Geophysics

the wave equation becomes increasingly complex, the numerical implementation of the equation becomes

nontrivial, and the computational cost increases considerably for large-scale scenarios.

Different from traditional model-driven methods, machine learning (ML) is a type of data-driven approach

that trains a regression or classification model through a complex nonlinear mapping with adjustable pa-

rameters based on a training data set. The comparison of model-driven and data-driven approaches is sum-

marized in Figure1. For decades, ML methods have been widely adopted in various geophysical applica-

tions, such as exploration geophysics (Huang etal.,2006; Helmy etal.,2010; Jia & Ma, 2017; Lim, 2005;

Poulton,2002; Zhang etal.,2014), earthquake localization (Mousavi, Horton, etal.,2016), aftershock pat-

tern analysis (DeVries etal., 2018), and Earth system analysis (Reichstein etal.,2019). A review article

about ML in solid Earth geoscience was recently published in Science (Bergen etal.,2019). The topic in-

YU AND MA

10.1029/2021RG000742

4 of 36

Figure 3. (a) and (b) are statics of artificial intelligence (AI)-related papers in SEG Library and AGU Library. In (a), Geophysics means the flagship journal

of SEG. SEG Expanded Abstracts means the Expanded Abstracts from SEG annual meeting. SEG Library papers mean the papers founded in the SEG digital

library. In (b), the first three captions in the legend are the names of top journals in AGU. The fourth caption in the legend represents the papers founded in the

AGU digital library.

Figure 4. The topics included in this review. (a) Deep learning (DL)-based geophysical applications. (b) The future trends of applying DL in geophysics.

Reviews of Geophysics

cludes a variety of ML techniques, from traditional methods, such as logistic regression, support vector

machines, random forests and neural networks, to modern methods, such as deep neural network and deep

generative models. The article stresses that ML will play a key role in accelerating the understanding of the

complex, interacting and multiscale processes of Earth's behavior.

In the ML community, an artificial neural network (ANN) is one such regression or classification model

that is analogous to the human brain and consists of layers of neurons. An ANN with more than one layer,

that is, a deep neural network (DNN), is the core of a recently developed ML method, named deep learning

(DL) (LeCun etal.,2015). DL mainly encompasses supervised and unsupervised approaches depending on

whether labels are available or not, respectively. Supervised approaches train a DNN by matching the input

and labels and are usually used for classification and regression tasks. Unsupervised approaches update

the parameters by building a compact internal representation and then are used for clustering or pattern

recognition. In addition, DL also contains semi-supervised learning where partial labels are available and

reinforcement learning where a human-designed environment provides feedback for the DNN. Figure2

summarizes the relationship from artificial intelligence to DL and the classification of DL approaches. DL

has shown potential in overcoming the limitations of traditional approaches in various areas. The perfor-

mance of DL is even superior to the performance of the human brain in specific tasks, such as image clas-

sification (5.1% vs. 3.57% with respect to the top-5 classification errors, He etal.,2016) and the game of Go.

The geophysical community has shown great interests in DL in recent years. Figure3 shows the published

papers related to artificial intelligence in two major geophysical unions, that is, society of exploration geo-

physics (SEG) and American geophysical union (AGU). A clear exponential growth is observed in both

libraries due to the use of DL techniques. Moreover, DL has also provided several astonishing results to

the geophysical community. For instance, on the STanford EArthquake Data set (STEAD), the earthquake

detection accuracy is improved to 100% compared to 91% accuracy of the traditional STA/LTA (short time

average over long time average) method (Mousavi, Zhu, Sheng, etal.,2019, Mousavi etal.,2020). DL makes

characterizing the earth with high resolution on a large scale possible (Chattopadhyay etal., 2020; Chen

etal.,2019; Zhang, Stanev, & Grayek,2020). DL can even be used for discovering physical concepts (Iten

etal.,2020), such as the solar system is heliocentric.

Our review introduces DL-related literature covering a variety of geophysical applications, from deep to the

Earth's core to distant outer space, and mainly focuses on exploration geophysics, earthquake science and a

geophysical data observation method for remote sensing. This review intends to first provide a glance at the

most recent DL research related to geophysics, along with analysis of the changes and challenges DL brings

to the geophysical community, and then discusses the future trends. Figure4 presents the topics included

in this review. In addition, we provide a cookbook for beginners who are interested in DL, from geophysical

students to researchers.

The first section above mentioned briefly introduces the background of geophysics and DL. Following con-

tents consist of three sections. The second section contains concepts, and we introduce the basic idea of

DL (Section2). The third section reviews DL applications in geophysical areas (Section3). A discussion of

future trends (Section4) is given as extensions of this review. The fifth section (Section5) summarizes this

review. A tutorial section for beginners is given in the appendix.

2. The Theory of Deep Learning

Readers who are already familiar with general theory in DL may skip to Section3. We denote scalars by

italic letters, vectors by bold lowercase letters and matrices by bold uppercase letters. In geophysics, a large

number of regression or classification tasks can be reduced to,

y Lx

(1)

where x stands for unknown parameters, y stands for observation which we partially know, and L is a for-

ward or degraded operator in geophysical data observation, such as noise contamination, subsampling, or

physical response. However, L is usually ill-conditioned or not invertible, or even not known. The inverse

of L is mainly approximately achieved by two routines: physical model-driven and data-driven. In physical

YU AND MA

10.1029/2021RG000742

5 of 36

Reviews of Geophysics

model-driven routines, an optimization objective loss function is established with an additional constraint,

such as sparsity constraint in dictionary learning. In data-driven routines, given an extensive training set, a

mapping between x and y is established by training, as done in DL, which is especially suitable for situations

where L is not precisely known.

To bring the reader into DL gradually, this paper first introduces another approach, that is, dictionary learn-

ing (Aharon etal., 2006), since the theoretical frameworks of dictionary learning and DL are similar. In

dictionary learning, an adaptive dictionary is learned as a representation of the target data. The key features

of dictionary learning are single-level decomposition, unsupervised learning, and linearity. Single-level de-

composition means that one dictionary is used to represent a signal. Unsupervised learning means no labels

are provided during dictionary learning. Besides, only the target data are used without an extensive training

set. Linearity implies that the data decomposition on the dictionary is linear. The above features make the

theory of dictionary learning simple. This review will help readers transfer existing knowledge on diction-

ary learning to DL.

2.1. Dictionary Learning

To solve Equation1, an optimization function E(x;y) with a regularization term R is constructed:

    

 

;,ED R

x y Lx y x

(2)

where D is a similarity measurement function. Typically, the L2-norm



Lx y

2

is used under the assump-

tion of Gaussian distribution for the error. Tikhonov regularization (

Rxx







) and sparsity are two pop-

ular regularization terms. In sparsity regularization,

x Wx





, where W is a sparse transform with

several vectorized bases. W is also termed as the dictionary. The goal of dictionary learning is to train an

optimized sparse transform W, which is used for the sparse representation of x. The objective function of

dictionary learning involves learning W via matrix decomposition with constraints Rw and Rv on the dic-

tionary W and coefficient v,



  

 

E D RR

W v LW v y W v

(3)

where W and v are optimized alternatively, that is, dictionary updating and sparse coding. Here we intro-

duce two dictionary learning approaches: K-SVD and data-driven tight frame (DDTF).

YU AND MA

10.1029/2021RG000742

6 of 36

Figure 5. An illustration of dictionary learning: data-driven tight frame. The dictionary is initialized with a spline

framelet. After training based on a post-stack seismic data set, the trained dictionary exhibits apparent structures.

Reviews of Geophysics

K-SVD (where SVD is singular value decomposition) (Aharon etal.,2006) regularizes the sparsity of v and

normalizes the energy of W. K-SVD uses orthogonal matching pursuit for sparse coding and several tricks

in dictionary updating. First, one component of the dictionary is updated at a given time, and the remaining

terms are fixed. Second, a rank-1 approximation SVD algorithm is used to obtain the updated dictionary and

coefficients simultaneously, thereby accelerating convergence and reducing computational memory. K-SVD

is applied in geophysics with extensions to improve efficiency (Nazari Siahsar etal.,2017).

Despite the success of K-SVD in signal enhancement and compression, dictionary updating is still time-con-

suming regarding high-dimensional and large-scale datasets, such as 3D prestack data in seismic explora-

tion. K-SVD includes one SVD step to update one dictionary term. Can the entire dictionary be updated

by one SVD for efficient improvement? A data-driven tight frame (Cai etal.,2014; Liang etal.,2014) was

proposed by enforcing a tight frame constraint on the dictionary W. The tight frame condition is a slightly

weaker condition than orthogonality, for which the perfect reconstruction property holds. With the tight

frame property, dictionary updating in DDTF is achieved with one SVD, which is hundreds of times faster

than K-SVD. DDTF has been applied in high dimensional seismic data reconstruction (Yu etal.,2015,2016).

An example of a learned dictionary with 3D DDTF for a seismic volume is shown in Figure5.

2.2. Deep Learning

Unlike dictionary learning, DL treats geophysical problems as classification or regression problems. A DNN

F is used to approximate x from y,



;Fx yΘ

(4)

where

is the parameter set of the DNN. In classification tasks, x is a one-hot encoded vector repre-

senting the categories.

is obtained by building a high-dimension approximation between two sets

YU AND MA

10.1029/2021RG000742

7 of 36

Figure 6. The learned features in deep learning. (a) Training samples. (b) In each layer, nine of the learned filters are

shown. A great number of hierarchical structures are observed in different layers. Layer 1 exhibits edge structures, layer

2 shows small structures of seismic events, and layer 3 shows small portions of seismic sections. The filters in layer 2

and 3 are blank near edges, which may be caused by the boundary effect of the convolutional filter. Layer 4 gives larger

seismic portions, which are approximations to the training data. The filters in layer 4 look more similar to each other

than training datasets because deep neural network (DNN) tries to learn the similar and hierarchical patterns which

compose the data.

Reviews of Geophysics



  ,1

iiNXx

and



  ,1

iiNYy

, that is, the labels and inputs. The approximation is achieved by

minimizing the following loss function to obtain an optimized





;, ;XY x y



 









(5)

If F is differentiable, a gradient-based method can be used to optimize Θ. However, a large Jacobi matrix is

involved when calculating

E

, making it infeasible for large-scale datasets. A back-propagation method

(Rumelhart etal.,1986) is proposed to compute

E

and avoid computing the Jacobi matrix. In unsuper-

vised learning, the label x is not known, such that additional constraints are required, such as making x

identical to y.

The relations of DL and dictionary learning are as follows: the depth of decomposition, the amount of

training data, and the nonlinear operators. Dictionary learning is usually a single-level matrix decompo-

sition problem. A double sparsity (DS) dictionary learning was proposed to explore deep decomposition

(Rubinstein etal.,2010). The motivation of DS is that the learned dictionary atoms still share several un-

derlying sparse pattern for a generic dictionary. In other words, the dictionary is represented with a sparse

coefficient matrix multiplied by a fixed dictionary, as in discrete cosine transform. Inspired by DS dictionary

learning, can we propose triple, quadruple or even centuple dictionary learning? We know cascading linear

operators are equivalent to a single linear operator. Therefore, using more than one fixed dictionary does

not improve the signal representation ability compared to that ability of one fixed dictionary if no additional

constraints are provided. In DL, nonlinear operators are combined in such a deep structure. An ANN with

one hidden layer and nonlinear operators can represent any complex function with a sufficient number of

hidden neurons. To fit ANN with many hidden neurons, we need an extensive training set, while dictionary

learning involves only one target data. To compare the learned features of dictionary learning in Figure5,

the hierarchical structures of filters in DL are shown in Figure6.

YU AND MA

10.1029/2021RG000742

8 of 36

Figure 7. Understanding deep learning (DL) from different perspectives. Optimization: DL is basically a nonlinear

optimization problem which solves for the optimized parameters to minimize the loss function of the outputs and

labels. Dictionary learning: The filter training in DL is similar to that in dictionary learning. High dimensional

mapping: Deep neural network (DNN) in DL is basically a high-dimensional mapping from the input to the labels.

Optimal transport: a generative adversarial network can be interpreted by the theory of optimal transportation, which

involves transformation between the given white noise and the data distribution. Manifold learning: The representation

of training samples in the latent space of a DNN is similar to that learning a low dimensional manifold which contains

all the data samples. Ordinary differential equation: a recurrent neural networks is basically a solution of an ordinary

differential equation with the Euler method.

Reviews of Geophysics

The theory of DL can be understanded from different angles except for dictionary learning (Figure7). On

one hand, DL can be treated as an ultra-high dimensional nonlinear mapping from data space to the feature

space or the target space, where the nonlinear mapping is represented by a DNN. Therefore, DL is basically

a high-dimensional nonlinear optimization problem. On the other hand, recurrent neural networks (RNNs)

are basically a solution of the ordinary differential equation with the Euler method (Chen etal.,2018). A

generative adversarial network (Creswell etal., 2018; Goodfellow et al., 2014) (GAN) can also be inter-

preted by the theory of optimal transportation, since the targets of GAN are mainly manifold learning and

probability distribution transformation, that is, transformation between the given white noise and the data

YU AND MA

10.1029/2021RG000742

9 of 36

Figure 8. Sketches of deep neural networks (DNNs). The blue lines indicate inputs, and the orange lines indicate outputs. The length of the blue and orange

lines represents the data dimension. The green lines indicate intermedia connections. (a) In a fully connected neural network (FCNN), the inputs of one layer

are connected to every unit in the next layer. f stands for a nonlinear activation function. In (b–f ), we omit the details of the layers and maintain the shape of

each network architecture. (b) Vanilla convolutional neural network (CNN) is cascaded by convolutional layers, pooling layers, nonlinear layer, and etc. In

CNN, the outputs of the convolutional layers are either the same or smaller than the input depending on the strides used for convolution. Pooling layers will

reduce the size of the extracted features. In regression or classification tasks, the output usually has the same dimension or a smaller dimension than the input

(where (b) shows the latter situation). The difference between regression and classification is that the outputs are continuous variables in regression tasks and

discrete variables representing categories in classification tasks. The dimension of the latent feature space in the CAE may be either larger or smaller than

that of the data space, where (c) shows the latter. (d) Skip connections in U-Net are used to bring the low-level features to a high level. (e) In a GAN, low-

dimensional random vectors are used to generate a sample from the generator, and then the sample is classified as true or false by the discriminator. (f) In an

recurrent neural network (RNN), the output or hidden state of the network is used as input in a cycle.

Figure 9. Details in deep neural network (DNN) architectures. (a) Activation functions in the nonlinear layer.

ReLU is commonly used since its gradient is easily computed and can avoid gradient vanishing. (b) A typical block

in convolutional neural network (CNN). The convolutional layer and ReLU layer (nonlinear layer) are the basic

components of one CNN block. The batch normalization layer can avoid gradient explosion. The pooling layer can

extract features by subsampling the input.

Reviews of Geophysics

distribution (Lei etal.,2020). RNNs and GANs are two specific DNNs and will be introduced in the next

subsection.

2.3. Deep Neural Network Architectures

The key components of DL are the training set, network architectures and parameter optimization. The ar-

chitectures of DNNs vary in different applications; here, we introduce several commonly used architectures.

A fully connected neural network (FCNN) (Figure8a) is an ANN composed of fully connected layers where

the inputs of one layer are connected to every unit in the next layer. The weighted summation of the inputs

passes through a nonlinear activation function f in one unit. The typical f in DL are rectified linear unit

(ReLU), sigmoid and tanh functions, as shown in Figure9a. The number of layers in a FCNN has a signifi-

cant effect on the fitting and generalization abilities of the model. However, FCNNs were restricted to a few

layers due to the computational capacity of the available hardware, the vanishing and explosion gradient

problem during optimization, etc. With the development of hardware and optimization algorithms, ANNs

tend to become deeper. On the other hand, if a raw data set is the input directly into the FCNN, massive

parameters are required since each pixel corresponds to one feature, especially for high dimensional inputs.

Features are used to basically reduce the dimension at the input layer and as a result reducing the amount of

parameters in the model. FCNN requires preselected features with full reliance on experience and ignores

the structure of the input entirely. Automated feature selection algorithms are proposed (Qi etal.,2020), but

require high computational resources. To reduce the number of parameters in an FCNN and consider local

coherency in an image, convolutional neural networks (CNN) (Figure8b) were proposed to share network

parameters with convolutional filters.

CNNs have developed rapidly since 2010 for image classification and segmentation, and several popular

CNNs include VGGNet (Simonyan & Zisserman,2015) and AlexNet (Krizhevsky etal.,2017). CNNs are also

used in image denoising (Zhang, Zuo, Chen, etal.,2017) and super-resolution tasks (Dong etal.,2014). A

CNN uses original data rather than selected features as an input set and uses convolutional filters to restrict

the inputs of a neural network to within a local range. The convolutional filters are shared by different neu-

rons in the same layer. As shown in Figure9b, one typical block in CNN consists of one convolutional layer,

one nonlinear layer, one batch normalization and one pooling layer. Convolutional layers and nonlinear

layers provide the basis components of CNN. Batch normalization layers prevent gradient explosion and

stabilize the training. Pooling layers subsamples the input to extract key features. The simplest CNNs are

named as vanilla CNNs, which are CNNs with simple sequential structures (the same for vanilla FCNN).

Vanilla CNNs are reliable for most applications in geophysics, such as denoising, interpolation, velocity

modeling, and data interpretation, if many training samples and labels are available. CNN is invariant to

small changes in the inputs due to the pooling layers. However, pooling layers lose information, such that

CNN cannot characterize the changes in the input. Capsule networks (Sabour etal.,2017) are proposed to

simultaneously keep the invariance and characterize the changes. This is achieved by replacing scalars with

vectors to serve as inputs and outputs of the neurons. The length of the vector represents the probability that

one entity exists. The orientation of the vector stands for the parameters of the entity.

More DL network architectures have been proposed for specific tasks based on vanilla FCNNs or CNNs.

An autoencoder learns to reconstruct the inputs with useful representations with an encoder and a decoder

(Makhzani,2018). The encoder uses nonlinear layers to map the inputs to a latent space. The decoder uses

nonlinear layers to decode the latent features into the original data space. Autoencoders are trained in a

self-supervised manner. To obtain meaningful representation, additional constraints are imposed on the

network. For example, undercomplete autoencoders limit the size of the latent space smaller than that of

the inputs, such that the encoder extracts critical features. Sparse autoencoders are usually overcomplete

with larger latent space than the input space and impose a sparse regularization on the latent space. De-

noising autoencoders or contractive autoencoders learn useful representations by making the autoencoder

robust to the input's variations. Convolutional autoencoders (CAE, Figure8c) use convolutional layers in

the encoder and deconvolutional layers in the decoder.

U-Nets (Ronneberger etal.,2015) (Figure8d) have U-shaped structures and skip connections. The skip

connections bring low-level features to high levels. U-Net was first proposed for image segmentation and

YU AND MA

10.1029/2021RG000742

10 of 36

Reviews of Geophysics

has been applied in seismic data processing, inversion, and interpretation. The U-shape structure with a

contracting path and expanding path makes every data point in the output contain all information from the

input, such that the approach is suitable for mapping data in different domains, such as inverting velocity

from seismic records. The input size of the test set must be the same as that in the training set for a trained

U-Net. The data need to be processed patch-wisely if the size is not identical to the requirement of U-Net.

A GAN (Figure8e) can be applied in adversarial training with one generator to produce a fake image or any

other type of data and one discriminator to distinguish the produced one from the real ones. When training

the discriminator, the real data set and generated data set correspond to labels one and zero, respectively.

Additionally, when the generator is trained, all datasets correspond to the label one. Such a game will finally

allow the generative network to produce fake images that the discriminative network cannot distinguish

from real images. A GAN is used to generate samples with similar distributions as the training set. The gen-

erated samples are used for simulating realistic scenarios or expanding the training set. An extended GAN,

named CycleGAN, was proposed with two generators and two discriminators for signal processing (Zhu

etal.,2017). In CycleGAN, a two-way mapping is trained for mapping two datasets from one to the other.

The training set of CycleGAN is not necessarily paired as in a vanilla CNN, which makes it relatively easy

to construct training sets in geophysical applications.

RNNs (Figure8f) are commonly used for tasks related to sequential data, where the current state depends

on the history of inputs fed into the neural network. Long short-term memory (LSTM) (Hochreiter &

Schmidhuber,1997) is a widely used RNN that considers how much historical information is forgotten or

remembered. The main advantage of LSTM is in handling longer time duration of data compared to the

vanilla RNN, which has vanishing gradient problem for long sequences. Therefore, the inference accuracy

of LSTM increases with the amount of historical information considered. Gated recurrent unit (GRU) (Cho

etal.,2014) is a variant of LSTM with a simpler architecture. Compared to LSTM, GRU has similar perfor-

mance with fewer parameters, such that is computationally cheaper. In geophysical applications, RNNs are

YU AND MA

10.1029/2021RG000742

11 of 36

Figure 10. The procedure of exploration geophysics. (a) The subsurface structures. The seismic wave is excited at

sources (red point) and propagates downward to the reflector and then propagates upwards until recorded by the

receivers (blue points). (b) The seismic records are after processing. (c) The seismic imaging result, where the lines

stand for the reflectors. (d) Underground properties are interpreted to determine where the reservoir locates.

Reviews of Geophysics

mainly used for predicting the next sample of a temporally or spatially sequenced data set. RNNs are also

used for seismic wavefield or earthquake signal modeling by simulating the time-dependent discrete partial

differential equation.

3. DL Geophysical Applications

The most direct method for applying DL in geophysics is transferring geophysical tasks to computer vision

tasks, such as denoising or classification. However, in certain geophysics applications, the characteristics

of geophysical tasks or data are quite different from those of computer vision. For example, in geophysics,

YU AND MA

10.1029/2021RG000742

12 of 36

Figure 11. Comparison of traditional and DL-based methods in exploration geophysics. (a) In random denoising

tasks, the curvelet denoising method (Herrmann & Hennenfent,2008) assumes that the signal is sparse under curvelet

transform, and a matching method is used for denoising. In velocity inversion tasks, full-waveform inversion based on

the wave equation is used for forward and adjoint modeling in the optimization algorithm. In fault interpretation tasks,

faults are picked by interpreters. (b) The mentioned tasks are treated as regression problems that are optimized with

neural networks. Different tasks may require different neural network architectures.

Reviews of Geophysics

we have large-scale and high-dimensional data but fewer annotated labels. In this section, we introduce

how DL approaches relieve the bottlenecks of traditional methods, what difficulties we encounter and how

to solve them. The development of DL applications in exploration geophysics is first reviewed, followed by

applications in earthquake science, remote sensing and other areas.

3.1. Exploration Geophysics

Exploration geophysics images the Earth's subsurface by inverting collocated physical fields at the surface,

among which seismic wavefields are the most commonly used. Seismic exploration uses reflective seismic

waves to predict subsurface structures. The main processes of seismic exploration consist of seismic data

sampling and processing (denoising, interpolation, etc.), inversion (migration, imaging, etc.), and inter-

pretation (fault detection, facies classification, etc.). Figure10 summarizes the procedure of exploration

geophysics. Figure11 compares traditional and DL-based methods in exploration geophysics.

3.1.1. Seismic Data Processing

Seismic data are contaminated by different types of noise, such as random noise from the background, ground

rolls that travel along the surface with high energy and mask useful signals, and multiple that reflected mul-

YU AND MA

10.1029/2021RG000742

13 of 36

Figure 12. Deep learning for scattered ground-roll attenuation. On the left is the original noisy data set. On the right is

the denoised data set. The scattered ground roll marked by the red arrows is removed.

Reviews of Geophysics

ti-times between the interfaces. One of the long-standing problems in exploration geophysics is to remove

noise and improve the signal-to-noise ratio (SNR) of signals. Traditional methods use handcrafted filters or

regularization for denoising certain kinds of noise by analyzing the corre-

sponding features (Herrmann & Hennenfent,2008). However, handcraft-

ed filters fail when the signal and noise share a common feature space.

DL methods avoid feature selection when used for seismic denoising. For

example, U-Net-based DeepDenoiser can separate signals and noise by

learning a nonlinear regression (Zhu etal.,2019). Moreover, with DnCNN

(Zhang, Zuo, Chen, etal.,2017), a CNN for denoising, the same architec-

ture can be used for three kinds of seismic noise while achieving a high

SNR (Yu etal.,2019) as long as a corresponding training set is construct-

ed. However, there is still a long way to go. A DNN trained on synthetic

datasets does not have a good generalization ability to field data. To make

the network reusable, transfer learning (Donahue etal.,2014) can be used

for field data denoising. Sometimes the labels of clean data are difficult to

obtain, and one solution is to use multiple trials involving user-generated

white noise to simulate real white noise (Wu, Zhang, Lin, Li, & Liu,2019).

An example of scattered ground-roll attenuation is shown in Figure12

(Yu etal.,2019). Scattered ground roll is mainly observed in the desert

area, and is caused by the scattering of ground roll when the near surface

is laterally heterogeneous. The scattered ground roll is difficult to remove

because it occupies the same frequency domain as the reflected signals.

DnCNN was used to remove scattered ground roll successfully.

Due to environmental or economic limitations, seismic geophones are usu-

ally located irregularly or not densely enough under the principle of Nyquist

sampling. The reconstruction or regularization of seismic data to a dense

and regular grid is essential to improve inversion resolution. In the begin-

ning, end-to-end DNNs were proposed for the reconstruction of regularly

missing data (Wang, Zhang, Lu, etal., 2019) and randomly missing data

(Mandelli etal.,2018; Wang, Wang, etal.,2020). However, the training sets

are numerically synthetic, and do not generalize well to field data. We can

YU AND MA

10.1029/2021RG000742

14 of 36

Figure 13. The training set and seismic interpolation result (Zhang, Yang, etal.,2020). (a) A subset of the natural image data set. The natural image data set

was used to train a network for seismic data interpolation. (b) An under-sampled seismic record. (c) The interpolated record corresponding to (b). The regions

1.6–1.88s and 1.0–1.375km are enlarged at the top-right corner.

Figure 14. Phase picking based on U-Net. The inputs are seismological

data. The outputs are zeros above the first arrival in the green area, ones

below the first arrival in the yellow area, and twos for the first arrival

on the blue line. The green line indicates the predicted first arrival. This

experiment was performed based on the modified code from https://

github.com/DaloroAT/first_break_picking.

Reviews of Geophysics

borrow training data from a natural image data set to train DnCNN and then embed it in the traditional project

onto a convex set (POCS, Abma & Kabir,2006) framework (Zhang, Yang, etal.,2020). The resulting interpola-

tion algorithm generalized well to seismic data. Moreover, no new networks were required for the interpolation

of other datasets. Figure13 gives the training set and a simple interpolation result (Zhang, Yang, etal.,2020).

YU AND MA

10.1029/2021RG000742

15 of 36

Figure 15. Predicting the velocity model with U-Net from raw seismological data (Yang & Ma,2019). The columns indicate different velocity models. From top

to bottom are the ground truth velocity models, generated seismic records from one shot, and the predicted velocity models.

Figure 16. Converting a three-channel color image into a velocity model (Wang & Ma,2020). (a–c) are original color image, grayscale image, and

corresponding velocity model. (d) is the seismic record generated from a cross-well geometry on (c).

Reviews of Geophysics

First arrival picking is used to select the first jumps of useful signals and

has been automated but needs intense human intervention to check pick-

ings with significant static corrections, weak energy, low signal-to-noise

ratios, and dramatic phase changes. DL helps improve the automation

and accuracy of first arrival picking on realistic seismic data. It is natural

to transform first arrival picking into a classification problem by setting

the first arrival as ones and other locations as zeros when DL is used (Hu

etal., 2019). However, such a setting can cause imbalanced labels. An

interesting approach treats first arrival picking as an image classification

problem, where anything before the first arrival is set to zero, and all in-

stances after the first arrival are set to one (Wu, Zhang, Li, etal.,2019).

This method works well for noisy situations and field datasets. After the

segmentation image is obtained, a more advanced picking algorithm,

such as an RNN, can be applied to take advantage of the global informa-

tion (Yuan etal.,2020).

Figure14 shows the results of the first arrival picking based on U-Net. We used 8,000 synthetic seismologi-

cal samples. A gradient constraint was added to the loss function to enhance the continuity of the selected

positions. For the output, three classifications were set: zeros before the first arrival, ones after the first arriv-

al, and twos for the first arrival. The training data set was contaminated with strong noise and had missing

traces. The predicted picking results were close to the labels.

More DL-based seismic signal processing literature that does not belong to the mentioned scope is summa-

rized in this paragraph. Signal compression is essential for the storage and transmission of seismic data.

Traditional seismic data are stored in 32 bits per sample. With an RNN to estimate the relationships among

samples in a seismic trace and compress seismic data, only 16 bits are needed for lossless representations,

such that half storage is saved (Payani etal.,2019). Seismic registration aligns seismic images for tasks such

as time-lapse studies. However, when large shifts and rapid changes exist, this task is extremely difficult. A

CNN is trained with two seismic images as inputs and the shift as output by learning from the concept of

optical flow. The method outperforms traditional methods but is dependent on the training data set (Dhara

and Bagaini,2020).

3.1.2. Seismic Data Imaging

Seismic imaging is a challenging problem since traditional methods such as tomography and full waveform

inversion (FWI) suffer from several bottlenecks. 1. Imaging is time-consuming due to the curse of dimen-

sionality. 2. Imaging relies heavily on human interactions to select proper velocities. 3. Nonlinear optimi-

zation needs a good initialization or low frequency information, however there is a lack of low frequency

energy in recorded data. DL methods help relief the bottlenecks from several angles.

First, end-to-end DL-based imaging methods use recorded data as inputs and velocity models as outputs,

which provides a totally different imaging approach. DL methods avoid the mentioned bottlenecks, provid-

ing a next-generation imaging method. The first attempts at DL in staking (Park & Sacchi,2019), tomog-

YU AND MA

10.1029/2021RG000742

16 of 36

Figure 17. Velocity picking based on U-Net (Wang etal.,2021). The

inputs are seismological data on the left. The outputs are the picking

positions on the right. AP means approximate root mean square velocity.

PD_REG and PD_CLS represent the velocity predictions of the regression

network and classification network, respectively.

Figure 18. Modified recurrent neural network (RNN) based on the acoustic wave equation for wave modeling

(Liu2020). The diagram represents the discretized wave equation implemented in an RNN. The auto-differential

mechanics of a deep neural network (DNN) help to efficiently optimize the velocity and density.

Reviews of Geophysics

raphy (Araya-Polo etal.,2018) and FWI (Yang & Ma,2019) show promising results on synthetic 2D data.

One important issue is that the input is in the data space and the output is in the model space, both with

high dimensional parameters. U-Net is used to transfer from different spaces with different dimensions,

and downsampling is used to reduce the parameters while training the DNN (Yang & Ma,2019). Figure15

shows the velocity inversion results from Yang and Ma (2019).

However, end-to-end DL imaging also has disadvantages, such as a lack of training samples and restricted

input sizes due to memory limitations. An interesting work used smoothed natural images as velocity

models, thus producing a large number of models to construct the training set (Wang & Ma,2020). Fig-

ure16 shows an example on how (Wang & Ma,2020) convert a three-channel color image to a velocity

model.

To make DL-based imaging applicable to large scale inputs, more works aim to collaborate with traditional

methods and solve one of the mentioned bottlenecks, such as extrapolating the frequency range of seismic

data from high to low frequencies for FWI (Fang, Zhou, etal.,2020; Ovcharenko etal.,2019), and adding

constraints to FWI (Zhang & Alkhalifah,2019). To mitigate the “curse of dimensionality” problem of global

optimization in FWI, CAE is used to reduce the dimension of FWI by optimizing in the latent space (Gao

etal.,2019). Another work aims at the high computational cost of forward modeling when the high-order

finite difference method is used. A GAN is used to produce a high-quality wavefield from a low-quali-

ty wavefield with a lower-order finite difference in the context of surface-related multiples, ghosts, and

dispersion (Siahkoohi etal., 2019). U-Net can be used for velocity picking in stacking (Figure17, Wang

etal.,2021). The inputs are seismological data, and the outputs have values of one where the picks are lo-

cated and values of zero elsewhere.

An alternative is to replace the FWI object with an RNN loss function. The structure of an RNN is similar to

that of finite different time evolution, and the network parameters correspond to the selected velocity mod-

el. Therefore, optimizing an RNN is equivalent to optimizing FWI (Sun, Niu, etal.,2020). Such a strategy is

extended to the simultaneous inversion of velocity and density (Liu,2020). Figure18 shows the structure

of a modified RNN-based on the acoustic wave equation used in (Liu,2020). The diagram represents the

discretized wave equation implemented in an RNN with a flow chart. The optimized method in FWI can

also be learned by a DNN rather than with a gradient-descent-based approach (Sun & Alkhalifah,2020).

An ML-descent method is proposed to consider the historical information of the gradient based on an RNN

rather than handcrafted directions.

3.1.3. Seismic Data Interpretation and Attributes Analysis

Seismic interpretation (faults, layers, dips, etc.) or attribute analysis (impedance, frequency, facies, etc.) can

be used to help the extraction of subsurface geologic information and locate underground sweet points. How-

ever, both tasks are time-consuming since interventions by experts are required. Preliminary works show that

DL has the potential to improve the efficiency and accuracy in seismic interpretation or attribute analysis.

YU AND MA

10.1029/2021RG000742

17 of 36

Figure 19. (a) A post-stack dataset. (b) Fault prediction result of (a). (c) A synthetic dataset (Wu etal.,2020).

Reviews of Geophysics

The localization of faults, layers, and dips in seismic interpretation is similar to object detection in computer vi-

sion. Therefore, DNNs for image detection can be directly applied in seismic interpretation. However, unlike the

computer vision industry, it is difficult to obtain a public training set or to manually construct a training set for

field datasets. Building realistic synthetic datasets rather than handcrafted field datasets is more efficient and can

produce similar results. Therefore, synthetic samples are used for training. To build an approximately realistic

3D training data set, randomly choosing folding and faulting parameters in a reasonable range is required (Wu

etal.,2020). Then, the data set is used to train a 3D U-Net for the seismic structural interpretation of features,

such as faults, layers, and dips, in field datasets. If the detected objects are of a small proportion, a class-balanced

binary cross-entropy loss function is used to adjust the data imbalance so that the network is not trained to pre-

dict only zeros (Wu, Liang, etal.,2019). An alternative to a synthetic training set is a semi-automated approach

that annotates the targets on a coarse scale and predicts them on a fine scale (Wu, Zhang, Lin, Cao, etal.,2019).

An example of synthetic post-stack image and field data fault analysis is shown in Figure19 (Wu etal.,2020).

Attribute analysis is similar to image classification, where seismic images are inputs and areas with labels

as different attributes are output. Therefore, DNNs for image classification can be directly applied in seismic

attribute analysis (Das etal.,2019; Feng, Mejer Hansen, etal.,2020; You etal.,2020). If the attributes cannot

be directly computed from the seismic data, a DNN can work in a cascaded way (Das & Mukerji,2020). If

labels are not available, CAE is used for feature extraction, and then a clustering method, such as K-means,

is used for unsupervised clustering (Duan etal.,2019; He etal.,2018; Qian etal.,2018). Clustering refers

to grouping similar attributes in an unsupervised manner. For example, we can use clustering to decide

whether a region contains fluvial facies or faults based on stacked sections. CAE and K-means can further

be optimized simultaneously for better feature extraction (Mousavi, Zhu, Ellsworth, etal.,2019). To mitigate

the dependence of vanilla CNNs on the amount of labeled seismic data available, a 1D CycleGAN-based

algorithm was proposed for impedance inversion (Wang, Ge, et al.,2019). The CycleGAN did not require

training set pairing. Only two sets with and without high fidelity are needed. To consider the spatial conti-

nuity and similarity of adjacent traces, an RNN is used in facies analysis (Li, Lin, etal.,2019).

3.2. Earthquake Science

The goal of earthquake data processing is quite different from that of exploration geophysics; therefore,

this section focuses on DL-based earthquake signal processing. The preliminary processing of earthquake

signals includes classification to distinguish real earthquakes from noise and arrival picking to identify the

arrival times of primary (P) and secondary (S) waves. Further applications involve earthquake location and

Earth tomography. DL has shown promising results in these applications.

3.2.1. Earthquake and Noise Classification

Earthquake signal and noise classification is the most fundamental and difficult task in earthquake early

warning (EEW). Traditional EEW systems surfer from false and missed alerts. DNN can be directly applied

YU AND MA

10.1029/2021RG000742

18 of 36

Figure 20. The architecture of wavelet scattering transform (WST). Unlike in a convolutional neural network (CNN), the outputs of WST are combined with

the outputs of each layer. Then, the outputs of WST serve as features for a classifier.

Reviews of Geophysics

in signal and noise discrimination since it is a classification task. With a sufficient training set, DNN can

achieve up to 99.2% (Li etal.,2018) and 99.5% precision (Meier etal.,2019) in different regions. To detect

small and weak earthquake signals robust to strong noise and non-earthquake signals, a residual network

with convolutional and recurrent units is developed (Mousavi, Zhu, Sheng, etal.,2019). RNN and CNN

are also used in a more challenging task to distinguish between anthropogenic sources, such as mining or

quarry blasts, and tectonic seismicity (Linville etal.,2019). More categories of signals are required to iden-

tify in specific tasks, such as in volcano seismic detection (Titos etal.,2019). Volcano seismic signals can be

classified into six classes: long-period events, volcanic tremors, volcano-tectonic events, explosions, hybrid

events, and tornados (Malfante etal.,2018). Uncertainty is also considered in volcano-seismic monitoring

(Bueno etal.,2019).

We provide an example of using the wavelet scattering transform (WST) (Mallat,2012) and a support vector

machine for earthquake classification with a limited number of training samples. The WST involves a cas-

cade of wavelet transforms, a module operator, and an averaging operator, corresponding to convolutional

filters, a nonlinear operator, and a pooling operator in a CNN, respectively. The critical difference between

the WST and a CNN is that the filters are predesigned with the wavelet transform in the WST. In our case,

only 100 records were used for training, and 2,000 records were used for testing. We obtained a classification

accuracy as high as 93% with the WST method. Figure20 shows the architecture of the WST algorithm.

3.2.2. Arrival Picking

Arrival picking for earthquakes identifies the arrival time of P and S waves. Traditional automated arriv-

al picking algorithms, such as short-term average/long-term average method (STA/LTA), are less precise

than human experts and rely on thresholding setting. DL-based arrival picking overcomes these shortcom-

ings and helps illuminate the Earth structure clearly (Wang, Xiao, etal., 2019). With a sufficiently large

training set, one can achieve remarkably picking and classification accuracies higher than STA/LTA (Zhao

etal.,2019; Zhou etal.,2019), even close to or better than human experts (Ross etal., 2018, 4.5 million

seismograms training set). If labels are not sufficient, a GAN-based model EarthquakeGen can be used to

artificially expand labeled data sets (Wang, Zhang, & Li,2019). The detection accuracy was greatly improved

by performing artificial sampling for the training set. Simultaneous earthquake detection and phase picking

can further improve the accuracy of both tasks (Mousavi etal.,2020; Zhou etal.,2019).

3.2.3. Earthquake Location and Other Applications

Earthquake location and magnitudes estimation are important in EEW and subsurface imaging. Conven-

tional earthquake location significantly relies on a velocity model and suffers from inaccurate phase picking.

YU AND MA

10.1029/2021RG000742

19 of 36

Figure 21. Locating earthquake sources with deep learning. The black triangles are stations. Left: the blue dots are

the actual locations. Right: the red circles are the predicted locations. The radius of a circle represents the predicted

epicenter error (Zhang, Zhang, etal.,2020).

Reviews of Geophysics

CNN is used for earthquake location by using received waveforms at several stations as input and location

map as output (Zhang, Zhang, etal.,2020). This method worked well for earthquakes (ML<3.0) with low

SNRs, for which traditional methods fail. The prediction results and errors of earthquake source locations

are indicated in Figure21. DL also helps estimate earthquake locations and magnitudes based on signals

from a single station (Mousavi & Beroza,2020a; Mousavi & Beroza,2020b). Further applications involving

associating seismic phases, which involves grouping the phase picks on multiple stations associated with an

individual event (Ross etal.,2019), and relationship analysis between a strong earthquake and postseismic

deformation (Yamaga & Mitsui,2019).

3.3. Remote Sensing—a Geophysical Data Observation Means

Remote sensing is an important means to collect geophysical data and images by using sensors in satellites

or aerial crafts. Remote sensing imagery mainly includes optical images, hyperspectral images, and syn-

thetic aperture radar (SAR) images. Large-scale and high-resolution satellite optical color imagery can be

used for precision agriculture and urban planning. To address the issue of objection rotation variations, a

rotation-invariant CNN for object detection in very high-resolution optical remote sensing images was pro-

posed, where a rotation-invariant layer was introduced by enforcing the training samples before and after

rotation to share the same features (Cheng etal.,2016). If the labels are not accurate, a two-step training

approach was used where first the CNN was initialized by numerous inaccurate reference data and then

refined on a small amount of correctly labeled data (Maggiori etal.,2017). To further improve the image

resolution, the image contours were extracted with an edge-enhancement GAN to remove the artifacts and

noise in super resolution (Jiang etal.,2019).

Images obtained by hyperspectral sensors have rich spectral information, such that different land cover

categories can potentially be precisely differentiated. In recent years, numerous works have explored DL

methods for hyperspectral image classification (Li, Song, etal.,2019). To consider the spectral-spatial struc-

ture simultaneously, a 3D CNN rather than a 2D one should be used to extract the effective features of hy-

perspectral imagery (Chen, Jiang, etal.,2016). The extracted features are useful for image classification and

target detection and open a new window for future research. An alternative means to explore the relation-

ships among different spectrum channels is to use RNN, which regards hyperspectral pixels as sequential

data input (Mou etal.,2017).

SAR systems artificially enlarge the aperture of radar to produce high-resolution images. SAR can operate

in all-weather and day-and-night conditions. CNN is used for target classification in SAR images, which

avoided handcrafted features and provided higher accuracy (Chen, Wang, etal.,2016). To consider both the

amplitude and phase information of complex SAR imagery, a complex-valued CNN for SAR image classifi-

cation was proposed to process complex-valued inputs (Zhang, Wang, etal.,2017).

3.4. Other AI Geophysical Applications

We investigate more AI geophysical applications in this section. The topics are roughly arranged by the

order from the Earth to outer space.

YU AND MA

10.1029/2021RG000742

20 of 36

Figure 22. Artificial intelligence (AI) models reconstruct temperature anomalies with many missing values (Kadow

etal.,2020).

Reviews of Geophysics

3.4.1. The Earth's Structure

Understanding the structure of the Earth is a challenging task since observations are mainly limited on the

earth's surface. The earth is roughly divided into the surface, crustal layers, mantle and core and from the

surface to inside; however, the detailed structures and properties of the earth are not clear. Moisture as an

important soil attribute, is predicted historically with high fidelity from two recent years of satellite data,

showing LSTM's potential for hindcasting, data assimilation, and weather forecasting (Fang etal.,2017;

Fang, Kifer, etal.,2020). The high-resolution 3D CT data of rocks is required to determine the rock's prop-

erty but results in a small field of view. A CycleGAN was proposed to obtain super resolution images from

low resolution one by training on an unpaired data set (Niu etal.,2020). Volcanic deformation was detected

by using a CNN to classify interferometric fringes in wrapped interferograms (Anantrasirichai etal.,2018).

The crustal thickness in eastern Tibet and the western Yangtze craton are estimated by Rayleigh surface

wave velocities based on DNN (Cheng etal.,2019). The mantle thermal state of simplified model planets

was predicted based on DL with an accuracy of 99% for both the mean mantle temperature and the mean

surface heat flux compared to the calculated values (Shahnas & Pysklywec,2020).

3.4.2. Water Resources

Water on Earth has a great impact on ecosystems and natural disasters. DL can help address several major

challenges in water sciences (Shen,2018). DL can predict the loop current in the ocean by learning the

pattern in sea surface height (SSH). An LSTM was proposed to predict SSH and current loop in the Gulf of

Mexico within 40kilometers nine weeks in advance (Wang, Zhuang, etal.,2019). Due to the limit of compu-

tational memory, the region of interest is split into different sub-regions. Further works directly reconstruct

SSH on a large and spatial and temporal space based on sparsely sampled data with CNN (Manucharyan

etal.,2021). By using observation from satellite and coastal stations simultaneously, GAN can be used to re-

construct the SSH of the whole North-Sea (Zhang, Stanev, etal.,2020). DL also help estimate the iceberg in

the pan-Antarctic near-coastal zone that covers the whole Antarctic continent for monitoring ice melt and

sea level increasing (Barbat etal.,2019), and coastal inundation for a better understanding of the geospatial

and temporal characteristics of coastal flooding (Liu etal.,2019).

YU AND MA

10.1029/2021RG000742

21 of 36

Figure 23. The bottom panel shows a keogram from auroral data collected on January 21, 2006 at Rankin Inlet. The keogram consists of a single column from

the auroral images at different times. The middle panel shows the probabilities for the six categories as predicted by the ridge classifier trained with the entire

training data set. At the top are auroral images at different times (Clausen & Nickisch,2018).

Reviews of Geophysics

In addition to oceans, water is stored in different forms, such as rivers, lakes, rain, and ice. DL has found

its roles in estimating groundwater storage (Sun etal.,2019), global water storage in the US (Sun, Scanlon,

etal.,2020), measuring accurate river widths by super resolution (Ling etal., 2019), predicting the tem-

perature of lake water (Read etal.,2019), predicting rainfall and runoff (Akbari Asanjan etal.,2018), and

prediction water vapor retrieval from remote sensing data (Acito etal.,2020).

3.4.3. Atmospheric Science

Atmospheric science observes and predicts climate, weather and atmospheric phenomena. Global observa-

tion of global atmospheric parameters is difficult since the earth is extremely large and sensor locations are

limited. Researchers chose a CNN-based inpainting algorithm to reconstruct missing values in global cli-

mate datasets such as HadCRUT4 (Kadow etal.,2020, Figure22). Air pollution is damaging both the Earth's

environment and human health. Researchers used DL to estimate ground-level PM2.5 or PM10 levels by

using satellite observations and station measurements (Li etal.,2017; Shen etal.,2018; Tang etal.,2018).

DL also helps improve the accuracy of weather forecasting, which is a long-standing challenge in atmos-

pheric science (Bonavita & Laloyaux,2020; Scher & Messori,2021). The tracks of typhoons were predicted

with a GAN based on satellite images (Rüttgers etal.,2019). A six-hour-advance track with an average error

of 95.6km was produced. Flow-dependent typhoon-induced sea surface temperature cooling was estimated

by a DNN and used for improving typhoon predictions (Jiang etal.,2018).

3.4.4. Space Science

Global space parameter estimation and prediction are long-standing tasks in space science. Researchers

used a DNN to predict short-term and long-term 3D dynamic electron densities in the inner magneto-

sphere (Chu etal., 2017). This network can obtain the magnetospheric plasma density at any time and

for any location. A regularized GAN is used to reconstruct dynamic total electron content (TEC) maps

(Chen etal.,2019). Several existing maps were used as references to interpolate missing values in some

regions, such as the oceans. The TEC maps can also be predicted two hours in advance with an LSTM (Liu

etal., 2020) or one day in advance with a GAN (Lee etal.,2021). Further, a DNN is used to estimate the

relationship between electron temperature and electron density in small regions (Hu etal.,2020). There-

fore, the global electron density is easily measured and used to predict the global electron temperature. The

YU AND MA

10.1029/2021RG000742

22 of 36

CNN CAE U-Net GAN RNN

Supervised

(End-to-end)

Yu et al. 2019

Dhara and

Bagaini 2020

Wang, Wang,

et al. 2020

Yang and Ma

2019

Wu, Shi,

et al. 2019

Siahkoohi et al.

2019

Yuan et al.

2020

Linville et al.

2019

Semi/

unsupervised

Duan et al. 2019

Niu et al. 2020

Optimization

Oriented Xiao et al. 2021

Sun and

Alkhalifah

2020

Mousavi, Zhu,

Ellsworth, et al.

2019

Sun, Niu, et al.

2020 Wang,

McMechan, et al.

2020

Physical

constraint

Zhang, Yang,

et al. 2020

Wu and

McMechan

2019

Uncertainty

estimation

Mousavi and

Beroza 2020a

Tasistro ‐Hart

et al. 2020

Grana et al.

2020

Note. Here optimization oriented means using DNNs to optimize the traditional model-driven objective functions.

Table 2

Examples of Literature That Use Different Network Architectures for Tasks Beyond End-To-End training

Reviews of Geophysics

geomagnetic storm can be predicted with LSTM with uncertainty estimation (Tasistro-Hart etal.,2020),

providing confidence in the output.

An aurora is an astronomical phenomenon commonly observed in polar areas. Auroras are caused by dis-

turbances in the magnetosphere caused by the solar wind. Auroral classification is important for polar

and solar wind research. Researchers used DNN to classify auroral images (Clausen & Nickisch,2018, Fig-

ure23). The classification results can further be used to produce an auroral occurrence distribution (Zhong

etal.,2020). To handle the situation where limited images were annotated, a CycleGAN model was used to

extract key local structures from all-sky auroral images (Yang etal.,2019).

4. Future Trends Directions for DL in Geophysics

4.1. The Development Trends of DL in Geophysics

The landmark achievements of DL appeared after 2015, such as VGGNet (Simonyan & Zisserman,2015),

ResNet (He etal.,2016), AlexNet (Krizhevsky etal.,2017) and AlphaGo in 2016. The first introduction of DL

in subjects related to geophysics focused on remote sensing in 2016 and 2017 (Chen, Jiang, etal.,2016; Chen,

Wang, etal.,2016; Maggiori etal.,2017; Li etal.,2017), since remote sensing is a common technique widely

used in many areas. In 2018 and 2019, more geophysical areas, such as exploration geophysics (Araya-Polo

etal.,2018) and earthquake studies (Mousavi, Zhu, Sheng & Beroza,2019), started to employ DL.

The first attempts started with simple FCNN methods followed by complex networks, such as CNN, RNN,

and GAN models. With respect to the training set, early works used end-to-end training borrowed from

the computer vision area, which requires a large number of annotated labels, while recent works have

started to consider unsupervised learning (He etal.,2018) and the combination of DL with a physical model

(Chattopadhyay etal.,2020; Wu & McMechan,2019). In 2020, more works focused on the uncertainty of

DL methods (Cao etal., 2020; Grana etal.,2020; Mousavi & Beroza,2020a). More examples are listed in

Table2. From these trends, we can conclude that an increasing number of researchers are trying to develop

DL methods that are specifically designed for geophysical tasks to make DL methods more practical. In the

next subsection, we introduce these future trends in detail.

4.2. Future Directions for Deep Learning in Geophysics

DL, as an efficient artificial intelligence technique, is expected to discover geophysical concepts and inherit

expert knowledge through machine-assisted mathematical algorithms. Despite the success of DL in some

geophysical applications such as earthquake detectors or pickers, their use as a tool for most practical geo-

physics is still in its infancy. The main problems include a shortage of training samples, low signal-to-noise

ratios, and strong nonlinearity. Among these issues, the critical challenge is the lack of training samples in

geophysical applications compared to those in other industries. Several advanced DL methods have been

proposed related to this challenge, such as semi-supervised and unsupervised learning, transfer learning,

YU AND MA

10.1029/2021RG000742

23 of 36

Figure 24. Diagrams of transfer learning. (a) Transfer learning between different data sets. The parameters of one trained model can be moved to another

model as initialization conditions. (b) Transfer learning between different tasks. The first layers of one trained model can be copied to another model.

Reviews of Geophysics

multimodal DL, federated learning, and active learning. We suggest that a focus be placed on the subjects

below for future research in the coming decade.

4.2.1. Semi-Supervised and Unsupervised Learning

In practical geophysical applications, obtaining labels for a large data set is time-consuming and can even

be infeasible. Therefore, semi-supervised or unsupervised learning is required to relieve the dependence on

labels. Dunham etal.(2019) focused on the application of semi-supervised learning in a situation in which

the available labels were scarce. A self-training-based label propagation method was proposed, and it outper-

formed supervised learning methods in which unlabeled samples were neglected. Semi-supervised learning

takes advantage of both labeled and unlabeled datasets. The combination of AE and K-means is an efficient

YU AND MA

10.1029/2021RG000742

24 of 36

Figure 25. An illustration of multimodal deep learning.

Reviews of Geophysics

unsupervised learning method (He etal.,2018; Qian etal.,2018). An autoencoder is used to learn low-di-

mensional latent features in an unsupervised way, and then K-means is used to cluster the latent features.

4.2.2. Transfer Learning

Usually, we must train one DNN for a specific data set and a specific task. For example, a DNN may effec-

tively process land data but not marine data, or a DNN may be effective in fault detection but not in facies

classification. Transfer learning (Donahue etal.,2014) is suggested to increase the reusability of a trained

network for different datasets or different tasks.

In transfer learning with different datasets, the optimized parameters for one data set can be used as initial-

ization values for learning a new network with another data set; this process is called fine-tuning. Fine-tun-

ing is typically much faster and easier than training a network with randomly initialized weights from

scratch. In transfer learning involving different tasks, we assume that the extracted features should be the

same in different tasks. Therefore, the first layers in a model trained for one task are copied to the new

model for another task to reduce the training time. Another benefit of transfer learning is that with a small

number of training samples, we can promptly transfer the learned features to a new task or a new data set.

Diagrams of these two transfer learning methods are shown in Figure24. Further topics in transfer learn-

ing include the relationship between the transferability of features (Yosinski etal.,2014) and the distance

between different tasks and different data sets (Oquab etal.,2014).

4.2.3. Combination of DL and Traditional Methods

Can we combine traditional and DL approaches to make geophysical mechanics and DL collaborate? Intu-

itively, such a combination can produce a more precise result than traditional methods and a more reliable

result than DL methods.

How can DL be incorporated into traditional methods? In a traditional iteration optimization algorithm,

the thresholding-based denoiser can be replaced by a DL denoiser (Zhang, Zuo, Gu, etal.,2017) such that

the reconstructed results are improved. On the other hand, different tasks use the same denoiser without

training a new denoiser. Another technique, DIP, uses a DNN architecture as a constraint on the data and

ensembles traditional physical models for different tasks (Lempitsky etal., 2018). Similar to the idea of

DIP, Wu and McMechan(2019) showed that a DNN generator can be added to an FWI framework. First, a

U-Net-based generator



;FΘv

with random input v was used to approximate a velocity model m with high

accuracy. Then,





;FmΘv

was inserted into the FWI objective function,

YU AND MA

10.1029/2021RG000742

25 of 36

Figure 26. Federated learning. The clients train the deep neural network (DNN) with local data sets and uploads the

model gradient to the server. The server aggregates the gradients and updates the global model. Then, the updated

model is distributed to all the local clients. Many rounds of training are performed until the model meets a certain

accuracy requirement.

Reviews of Geophysics

EFWI 















PF r

;

(6)

where dr is the seismic record and P is the forward wavefield propagator. The gradient of EFWI with

respect to network parameters

is calculated with the chain rule. U-Net is only used for regularizing

the velocity model. After training, one forward propagation of the network will produce a regularized

result.

Traditional optimization methods also benefit from the autodifference mechanism in DL, which makes op-

timization more efficient by replacing conjugate gradient descent or LBGFS with DL optimization methods,

such as SGD and Adam (Sun, Niu, etal.,2020; Wang, Chang, etal.,2020). DL also inspired new directions in

the study of traditional nonlinear optimization algorithms, such as ML-descent (Sun and Alkhalifah,2020)

and DL-based adjoint state methods (Xiao etal.,2021).

How can traditional methods be incorporated into DL? With an additional physical constraint on DL meth-

ods, fewer training samples are required to obtain a more generalized inference than those of traditional

methods. Raissi etal.(2019) proposed a physically informed neural network (PINN) that combines training

data and physical equation constraints for training. Taking wave modeling as an example, the wavefield was

represented with a DNN,

  

, ,;u xt F xt Θ

, such that the acoustic wave equation was:

u c u F xt c F xt

u xt F xt























, ,; ,; ,;

(7)

How can DL and traditional methods cooperate? Another benefit of combining data-driven and mod-

el-driven approaches is that we can obtain high-resolution solutions on a large scale. The process on a

large scale was numerically solved with a low-resolution grid based on physical equations. On a small

scale, the process was solved by data-driven DL methods (Chattopadhyay etal.,2020). Therefore, the high

computational demand on a fine scale is avoided. DL can also be used for discovering physical concepts

(Iten etal.,2020).

It is more common to hear someone ask, “Does machine learning have a real role in hydrological mod-

eling?” rather than, “What role will hydrological science play in the age of machine learning?” (Nearing

etal., 2020). As the authors claim, DL has uncovered the principles in large-scale rainfall-runoff simula-

tions, which cannot be explained by physical models. DL has a great impact on traditional methods, causing

a collision between new and old ideas. We believe that DL and physical-based methods will be used together

to move science forward for a long time.

4.2.4. Multimodal Deep Learning

To improve the resolution of inversion, the joint inversion of data from different sources has been a popular

topic in recent years (Garofalo etal.,2015). One of the advantages of DNNs is that they can fuse informa-

YU AND MA

10.1029/2021RG000742

26 of 36

Figure 27. An illustration of active learning. We choose samples with high uncertainty and manually annotate them to

serve as training samples.

Reviews of Geophysics

tion from multiple inputs. In multimodal DL (Ngiam etal.,2011; Ramachandram & Taylor,2017), inputs

are from different sources, such as seismic data and gravity data. Collecting data from different sources can

help relieve the bottleneck of a limited number of training samples. Besides, using multimodal datasets can

increase the quality and reliability of DL methods (Zhang, Stanev, etal.,2020). Feng, Fang etal.(2020) used

data integration to forecast streamflow where 23 variables were used, such as precipitation, solar radiation,

and temperature. Figure25 shows an illustration of multimodal DL.

4.2.5. Federated Learning

To provide a practical training set in DL for geophysical applications, collecting available datasets from

different institutes or corporations might be a possible solution. However, data transfer via the internet is

time-consuming and expensive for large-scale geophysical datasets. Besides, most datasets are protected and

cannot be shared. Federated learning was first proposed by Google (Mcmahan etal.,2017; Li etal.,2020) to

train a DNN with user data from millions of cellphones without privacy or security issues. The encrypted

gradients from different clients are assembled in a central server, thus avoiding data transfer. The server up-

dates the model and distributes information to all clients (Figure26). In a simple federated learning setting,

the clients and the server share the same network architecture. We give a possible example of federated

learning in geophysics based on the concept that some corporations do not share the annotations of first ar-

rivals; however, they can benefit from federated learning by training a DNN together for first arrival picking.

4.2.6. Uncertainty Estimation

One of the remaining questions associated with applying DL in geophysics is related to whether the results

of DL-based methods without a solid theoretical foundation can be trusted. DL-based uncertainty analysis

methods include Monte Carlo dropout (Gal & Ghahramani,2016), Markov chain Monte Carlo (MCMC)

(de Figueiredo etal.,2019), variational inference (Subedar etal.,2019), etc. For example, in Monte Carlo

dropout, dropout layers are added to each original layer to simulate a Bernoulli distribution. With multi-

ple realizations of dropout, the results are collected, and the variance is computed as the uncertainty. DL

with uncertainty estimation in inference is reported in areas such as volcano-seismic monitoring (Bueno

etal.,2019), geomagnetic storm forecasting (Tasistro-Hart etal.,2020), weather forecasting (Scher & Mes-

sori,2021; Bonavita & Laloyaux,2020), soil moisture predictions (Fang, Kifer, etal.,2020) and earthquake

locations estimation (Mousavi & Beroza,2020b).

4.2.7. Active Learning

To train a high-precision model using a small amount of labeled data, active learning is proposed to imitate

the self-learning ability of human beings (Yoo & Kweon,2019). An active learning model selects the most

useful data based on a sampling strategy for manual annotation and adds this data to the training set; then,

the updated data set is used for the next round of training (Figure27). One of the sampling strategies is

based on the uncertainty principle, that is, the samples with high uncertainty are selected. Taking fault

detection as an example, if a trained network is not sure whether a fault exists at a given location, we can

annotate the fault manually and add the sample to the training set.

5. Summary

In this review, the key concepts of DL approaches are introduced, a broad range of applications of DL in

geophysics are presented with the pros and cons, finally the future trends are discussed for geophysical

readers who are beginning their trip in DL. DL methods have created both opportunities and challenges

in geophysical fields. Pioneering researchers have provided a basis for DL in geophysics with promising

results; more advanced DL technologies and more practical problems must now be explored. To close this

study, we summarize a roadmap for applying DL in different geophysical tasks in terms of three levels.

• Traditional methods are time-consuming and require intensive human labor and expert knowledge,

such as in first-arrival selection and velocity selection in exploration geophysics.

• Traditional methods have difficulties and bottlenecks. For example, geophysical inversion requires good

initial values and high accuracy modeling and suffers from local minimization.

• Traditional methods cannot handle some cases, such as multimodal data fusion and inversion.

YU AND MA

10.1029/2021RG000742

27 of 36

Reviews of Geophysics

With the development of new artificial intelligence models beyond DL and advances in research into the

infinite possibilities of applying DL in geophysics, we can expect intelligent and automatic discoveries of

unknown geophysical principles soon.

Appendix A: A Deep Learning Tutorial for Beginners

A Coding Example of a DnCNN

The implementation of DL algorithms in geophysical data processing is quite simple based on existing

frameworks, such as Caffe, Pytorch, Keras, and TensorFlow. Here, we provide an example of how to use

Python and Keras to construct a DnCNN for seismic denoising. The code requires 12 lines for data set

loading, model construction, training, and testing. The data set is preconstructed and includes a clean sub-

set and a noisy subset; the overall data set includes 12,800 samples with a size of 64



 64 (available at

https://bit.ly/33SyXPO).

Any appropriate plotting tool can be used for data visualization. The training takes less than one hour on an

NVidia 2080Ti graphics processing unit. The readers can try this code in their own areas as long as a training

set is compatibly constructed.

Tips for Beginners

We introduce several practical tips for beginners who want to explore DL in geophysics from the perspec-

tive of the three most critical steps in DL: data generation, network construction and training. Though

exploration geophysics is used as example, the tips for data generation and network training are generally

applicable to most areas. Network construction generally depends on the task.

Data Generation

As noted by Poulton(2002), “training a feed-forward neural network is ∼10% of the effort involved in an

application; deciding on the input and output data coding and creating good training and testing sets is

90% of the work”. In DL, we advise that the percentages of the effort for network construction and data set

preparation should be ∼40% and 60%. First, most DL approaches use an original data set as the input, thus

reducing feature extraction efforts. Second, a wider variety of network architectures and parameters can be

used in DL compared to those in traditional neural networks. Overall, constructing a proper training set

plays a more prominent role in DL.

YU AND MA

10.1029/2021RG000742

28 of 36

Reviews of Geophysics

Synthetic datasets can be used effectively in DL, which is advantageous since labeled real datasets are some-

times difficult to obtain. First, to assess the applicability of DL in a specific geophysical application, using

synthetic datasets is the most convenient method. Second, if a satisfactory result is obtained with synthetic

datasets, a few annotated real datasets can be used for transfer learning via parameter tuning. Third, if

the synthetic datasets are sufficiently complicated, that is, if the most important factors are considered

when generating the datasets, the trained network may be able to process realistic datasets directly (Wu

etal.,2020; Wu, Liang, etal.,2019).

A synthetic training set should be diverse. First, we suggest using an existing synthetic data set with an open

license, instead of generating a data set. For specific tasks, such as FWI, a data set may need to be generated

based on a wave equation. Second, data augmentation methods, such as rotation, reflection, scaling, trans-

lation, and adding noise, missing traces, or faults to clean datasets, can be used to expand the training set.

The goal is to generate extremely large synthetic datasets that are as close to realistic datasets as possible.

To generate realistic datasets, we suggest using existing methods to generate labels that should then be

checked by a human. For example, in first-arrival picking, an automatic picking algorithm is used to pre-

process the datasets, and the results are then provided to an expert who identifies the outliers. We also

suggest using active learning (Yoo & Kweon,2019) to provide a semiautomated labeling procedure. First, all

datasets with machine annotation are used to train a DNN, and the samples with high predicted uncertainty

are required to be manually annotated.

Network Construction for Different Tasks

Beginners are suggested to use a DnCNN or U-Net for testing. DnCNNs are available for most tasks in which

the input and output share the same domain, such as denoising, interpolation, and attribute analysis. The

input size of a DnCNN can vary since there are no pooling layers involved. However, each output data

point is determined by a local field from the input rather than from the entire input set. Additionally, U-Net

contains pooling layers, and all input points are used to determine an output point. U-Nets are available for

tasks even when the inputs and outputs are in different domains, such as in FWI. However, the input size

of U-Net is fixed once trained and the data need processed patch-wisely.

Combining a CAE and K-means is suggested for unsupervised clustering tasks, such as attribute classifica-

tion. We do not suggest CycleGAN for geophysical tasks since the training process is extremely time-con-

suming and the results are not stable. An RNN provides a high-performance framework for time-dependent

tasks, such as forward wave modeling and FWI. RNNs are also used for regression and classification tasks

involving temporal or spatial sequential datasets, such as in the denoising of a single trace.

To adjust the hyperparameters of a DNN and optimization algorithms, we suggest using an autoML toolbox,

such as Autokeras, instead of manually adjusting the values. The basic objective is to search for the best

parameter combination within a given sampling range. Such a search is exceptionally time-consuming, and

a random search strategy may accelerate the tuning process. Moreover, for most applications, the default

architecture gives reasonable results.

Training, Validation, and Testing

The available data set should be split into three subsets: one training set, one validation set, and one test set

to optimize the network parameters. The proportions of the subsets depend on the overall size of a data set.

For datasets with 10–50K samples, the proportions are suggested to be 60%, 20%, and 20%, respectively. For

larger datasets (for instance, those larger than 1M), much smaller portions are often used for validation and

test (∼1%–5%) since the alternative can result in using unnecessarily large test/validation sets and wasting

the data that can be used for training and building a better model. In a classification task, we suggest using

one-hot coding in training. The validation set is used to test the network during training. Then, the model

with the best validation accuracy is selected rather than the final trained model. If the validation accuracy

does not improve or decrease after some saturation during training, an early stopping strategy is suggested

to avoid overfitting. Network hyperparameters should be tuned according to the validation accuracy. The

validation set is used to guide training, and the test set is used to test the model based on unseen data sets;

however, the test set should not be used for hyperparameter tuning.

YU AND MA

10.1029/2021RG000742

29 of 36

Reviews of Geophysics

Two commonly seen issues during training are as follows: the validation loss is less than the training loss,

and the loss is not a number. Intuitively, the training loss should be less than the validation loss since the

model is trained with a training data set. Several potential reasons for this issue are as follows: 1. Regulari-

zation occurs during training but is ignored during validation, such as in the dropout layer; 2. The training

loss is obtained by averaging the loss of each batch during an iteration, and the validation loss is obtained

based on the loss after one iteration; and 3. The validation set may be less complicated than the training set,

especially when only the training set has been augmented. The potential reasons for NaN loss are as follows:

1. The learning rate is too high; 2. In an RNN, one should clip the gradient to avoid gradient explosion and

3. Zero is used as a divisor, negative values are used in logarithm, or an exponent is assigned too large of a

value.

Glossary

AE Autoencoder; an ANN with the same inputs and outputs.

AI Artificial Intelligence; Machines are taught to think like humans.

ANN Artificial neural network; a computing system inspired by biological neural networks that

constitute animal brains.

Aurora A natural light display in the earth's sky; disturbances in the magnetosphere caused by the

solar wind.

BNN Bayesian neural network; the network parameters are random variables instead of regular

variables.

CAE Convolutional autoencoder; an AE with shared weights.

CNN Convolutional neural network; a DNN with shared weights.

DDTF Data-driven tight frame; A dictionary learning method using a tight frame constraint for the

dictionary.

Deblending In seismic exploration, several explosion sources are shot very close in time to improve ef-

ficiency. Then, the seismic waves from different sources are blended. The recorded data set

first needs to be deblended before further processing.

Dictionary A set of vectors used to represent signals as a linear combination.

DIP Deep image prior; the architecture of a DNN is used as a prior constraint for an image.

DL Deep learning; a machine learning technology based on a deep neural network.

DnCNN Denoised convolutional neural network.

DNN Deep neural network; an ANN with many layers between the input and output layers.

DS Double sparsity; the data are represented with a sparse coefficient matrix multiplied by an

adaptive dictionary. The adaptive dictionary is represented by a sparse coefficient matrix

multiplied by a fixed dictionary.

Event In exploration geophysics, a seismic event means reflected waves with the same phase. In

seismology, an event means a happened earthquake.

Facies A seismic facies unit is a mapped, three-dimensional seismic unit composed of groups of

reflections whose parameters differ from adjacent facies units.

Fault a discontinuity in a volume of rock across which there has been significant displacement as

a result of rock-mass movement.

FCN Fully convolutional network; an FCN is a network that contains no fully connected layers.

Fully connected layers do not share weights.

FCNN Fully connected neural network; an FCNN is a network composed of fully connected layers.

FWI Full waveform inversion; full waveform information is used to obtain subsurface parameters.

FWI is achieved based on the wave equation and inversion theory.

GAN Generative adversarial network; GANs are used to generate fake images. A GAN contains a

generative network and a discriminative network. The generative network tries to produce a

nearly real image. The discriminative network tries to distinguish whether the input image

is real or generated. Therefore, such a game will eventually allow the generative network to

produce fake images that the discriminative network cannot distinguish from real images.

Graphics processing unit (GPU) A parallel computing device. GPUs are widely used for training neu-

ral works in deep learning.

YU AND MA

10.1029/2021RG000742

30 of 36

Reviews of Geophysics

HadCRUT4 Temperature records from Hadley Centre (sea surface temperature) and the Climatic Re-

search Unit (land surface air temperature).

K-means A classical clustering algorithm, where K is the number of clusters.

K-SVD A dictionary learning method using SVD for dictionary updating.

LSTM long short-term memory; LSTM considers how much historical information is forgotten or

remembered with adaptive switches.

Magnetosphere Range of the magnetic field surrounding an astronomical object where charged particles

are affected.

ML Earthquake local magnitude; a method for measuring earthquake scale.

Patch In dictionary learning, an image is divided into many patches (blocks) that are the same size

as the atoms in a dictionary.

PINN Physical informed neural network; A physical equation is used to constrain the neural

network.

PM Particulate matter. PM10 are coarse particles with a diameter of 10 micrometers or less;

PM2.5 are fine particles with a diameter of 2.5micrometers or less.

ResNet Residual neural network; ResNets contain skip connections to jump over several layers. The

output of a residual block is the residual between the input and the direct output.

RNN Recurrent neural network; in time-sequenced data processing applications, RNNs use the

output of a network as the input of the subsequent process to consider the historical context.

SAR Synthetic aperture radar; the motion of a radar antenna over a target is treated as an antenna

with a large aperture. The larger the aperture is, the higher the image resolution will be.

Solar wind A stream of charged particles released from the upper atmosphere of the Sun.

Sparse coding Input data are represented in the form of a linear combination of a dictionary where the

coefficients are sparse.

Sparsity The number of nonzero values in a vector.

SVD Singular value decomposition; a matrix factorization method. A=USV, where U and V are

two orthogonal matrices, S is a diagonal matrix whose elements are the singular values of A.

SVD is used for dimension reduction by removing the smaller singular values. SVD is also

used for recommendation systems and natural language processing.

Tight frame A frame provides a redundant, stable way of representing a signal, similar to dictionary. A

tight frame is a frame with the perfect reconstruction property; i.e., WTW=I.

Tomography Inversion of the subsurface velocity based on travel time information.

U-Net U-shaped network; U-Nets have U-shaped structures and skip connections. The skip connec-

tions bring low-level features to high levels.

Wave equation A partial differential equation that controls wave propagation.

WST Wavelet scattering transform; a transform involves a cascade of wavelet transforms, a module

operator, and an averaging operator.

Data Availability Statement

Data were not used, nor created for this research.

References

Abma, R., & Kabir, N. (2006). 3D interpolation of irregular data with a POCS algorithm. Geophysics, 71(6), 91–97. https://doi.

org/10.1190/1.2356088

Acito, N., Diani, M., & Corsini, G. (2020). CWV-Net: A deep neural network for atmospheric column water vapor retrieval from hyper-

spectral VNIR data. IEEE Transactions on Geoscience and Remote Sensing, 58(11), 8163–8175. https://doi.org/10.1109/tgrs.2020.2987905

Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation.

IEEE Transactions on Signal Processing, 54(11), 4311–4322. https://doi.org/10.1109/tsp.2006.881199

Akbari Asanjan, A., Yang, T., Hsu, K., Sorooshian, S., Lin, J., & Peng, Q. (2018). Short-term precipitation forecast based on the PER-

SIANN system and LSTM recurrent neural networks. Journal of Geophysical Research: Atmosphere, 123(22), 12543–12563. https://doi.

org/10.1029/2018jd028375

Anantrasirichai, N., Biggs, J., Albino, F., Hill, P., & Bull, D. (2018). Application of machine learning to classification of volcanic defor-

mation in routinely-generated InSAR data. Journal of Geophysical Research, 123(8), 6592–6606. https://doi.org/10.1029/2018JB015911

Araya-Polo, M., Jennings, J., Adler, A., & Dahlke, T. (2018). Deep-learning tomography. The Leading Edge, 37(1), 58–66. https://doi.

org/10.1190/tle37010058.1

YU AND MA

10.1029/2021RG000742

31 of 36

Acknowledgments

The work was supported in part by

the NSFC under grant nos. 41625017

and 41804102, National Key Research

and Development Program of China

under grant nos. 2017YFB0202902 and

2018YFC1503705. The authors thank

Society of Exploration Geophysicists,

Nature Research, and American Asso-

ciation for the Advancement of Science

for allowing us to reuse the original

figures from their journals.

Reviews of Geophysics

Barbat, M. M., Rackow, T., Hellmer, H. H., Wesche, C., & Mata, M. M. (2019). Three years of near-coastal Antarctic iceberg distribution

from a machine learning approach applied to SAR imagery. Journal of Geophysical Research: Oceans, 124(9), 6658–6672. https://doi.

org/10.1029/2019jc015205

Bergen, K. J., Johnson, P. A., de Hoop, M. V., & Beroza, G. C. (2019). Machine learning for data-driven discovery in solid earth geoscience.

Science, 363(6433), 1–10. https://doi.org/10.1126/science.aau0323

Bonavita, M., & Laloyaux, P. (2020). Machine learning for model error inference and correction. Journal of Advances in Modeling Earth

Systems, 12(12), e2020MS002232. https://doi.org/10.1029/2020ms002232

Bueno, A., Benitez, C., De Angelis, S., Moreno, A. D., & Ibanez, J. M. (2019). Volcano-seismic transfer learning and uncertainty quantifi-

cation with Bayesian neural networks. IEEE Transactions on Geoscience and Remote Sensing, 58(2), 892–902. https://doi.org/10.1109/

TGRS.2019.2941494

Cai, J., Ji, H., Shen, Z., & Ye, G. (2014). Data-driven tight frame construction and image denoising. Applied and Computational Harmonic

Analysis, 37(1), 89–105. https://doi.org/10.1016/j.acha.2013.10.001

Cao, R., Earp, S., de Ridder, S. A. L., Curtis, A., & Galetti, E. (2020). Near-real-time near-surface 3D seismic velocity and uncertainty

models by wavefield gradiometry and neural network inversion of ambient seismic noise. Geophysics, 85(1), KS13–KS27. https://doi.

org/10.1190/geo2018-0562.1

Chattopadhyay, A., Subel, A., & Hassanzadeh, P. (2020). Data-driven super-parameterization using deep learning: Experimentation with

multiscale Lorenz 96 systems and transfer learning. Journal of Advances in Modeling Earth Systems, 12(11), e2020MS002084. https://

doi.org/10.1029/2020ms002084

Chen, R. T., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. (pp. 6572–6583). In Proceedings of the

32nd International Conference on Neural Information Processing Systems (NIPS'18). https://dl.acm.org/doi/10.5555/3327757.3327764

Chen, S., Wang, H., Xu, F., & Jin, Y. (2016). Target classification using the deep convolutional networks for SAR images. IEEE Transactions

on Geoscience and Remote Sensing, 54(8), 4806–4817. https://doi.org/10.1109/tgrs.2016.2551720

Chen, Y., Jiang, H., Li, C., Jia, X., & Ghamisi, P. (2016). Deep feature extraction and classification of hyperspectral images based on

convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing, 54(10), 6232–6251. https://doi.org/10.1109/

tgrs.2016.2584107

Chen, Z., Jin, M., Deng, Y., Wang, J.-S., Huang, H., Deng, X., & Huang, C.-M. (2019). Improvement of a deep learning algorithm for total

electron content maps: Image completion. Journal of Geophysical Research, 124(1), 790–800. https://doi.org/10.1029/2018ja026167

Cheng, G., Zhou, P., & Han, J. (2016). Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote

sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12), 7405–7415. https://doi.org/10.1109/tgrs.2016.2601622

Cheng, X., Liu, Q., Li, P., & Liu, Y. (2019). Inverting Rayleigh surface wave velocities for crustal thickness in eastern Tibet and the west-

ern Yangtze craton based on deep learning neural networks. Nonlinear Processes in Geophysics, 26(2), 61–71. https://doi.org/10.5194/

npg-26-61-2019

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations

using RNN encoder-decoder for statistical machine translation. (pp. 1724–1734). In Proceedings of the 2014 Conference on Empirical

Methods in Natural Language Processing (EMNLP). https://doi.org/10.3115/v1/D14-1179 arXiv preprint arXiv:1406.1078.

Chu, X., Bortnik, J., Li, W., Ma, Q., Denton, R., Yue, C., etal. (2017). A neural network model of three-dimensional dynamic electron den-

sity in the inner magnetosphere. Journal of Geophysical Research, 122(9), 9183–9197. https://doi.org/10.1002/2017ja024464

Clausen, L. B. N., & Nickisch, H. (2018). Automatic classification of auroral images from the oslo auroral themis (oath) data set using

machine learning. Journal of Geophysical Research: Space Physics, 123(7), 5640–5647. https://doi.org/10.1029/2018ja025274

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An over-

view. IEEE Signal Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/MSP.2017.2765202

Das, V., & Mukerji, T. (2020). Petrophysical properties prediction from prestack seismic data using convolutional neural networks. Geo-

physics, 85(5), N41–N55. https://doi.org/10.1190/geo2019-0650.1

Das, V., Pollack, A., Wollner, U., & Mukerji, T. (2019). Convolutional neural network for seismic impedance inversion. Geophysics, 84(6),

R869–R880. https://doi.org/10.1190/geo2018-0838.1

de Figueiredo, L. P., Grana, D., Roisenberg, M., & Rodrigues, B. B. (2019). Gaussian mixture Markov chain Monte Carlo method for linear

seismic inversion. Geophysics, 84(3), R463–R476. https://doi.org/10.1190/geo2018-0529.1

DeVries, P. M. R., Viegas, F., Wattenberg, M., & Meade, B. J. (2018). Deep learning of aftershock patterns following large earthquakes.

Nature, 560(7720), 632–634. https://doi.org/10.1038/s41586-018-0438-y

Dhara, A., & Bagaini, C. (2020). Seismic image registration using multiscale convolutional neural networks. Geophysics, 85(6), V425–V441.

https://doi.org/10.1190/geo2019-0724.1

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recogni-

tion. International Conference on Machine Learning, 647–655. https://dl.acm.org/doi/10.5555/3044805.3044879

Dong, C., Loy, C. C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. (pp. 184–199). European

Conference on Computer Vision. https://doi.org/10.1007/978-3-319-10593-2_13

Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical

Association, 90(432), 1200–1224. https://doi.org/10.1080/01621459.1995.10476626

Duan, Y., Zheng, X., Hu, L., & Sun, L. (2019). Seismic facies analysis based on deep convolutional embedded clustering. Geophysics, 84(6),

IM87–IM97. https://doi.org/10.1190/geo2018-0789.1

Dunham, M. W., Malcolm, A., & Kim Welford, J. (2019). Improved well-log classification using semisupervised label propagation and

self-training, with comparisons to popular supervised algorithms. Geophysics, 85(1), O1–O15. https://doi.org/10.1190/geo2019-0238.1

Fang, J., Zhou, H., Elita Li, Y., Zhang, Q., Wang, L., Sun, P., & Zhang, J. (2020). Data-driven low-frequency signal recovery using deep-learn-

ing predictions in full-waveform inversion. Geophysics, 85(6), A37–A43. https://doi.org/10.1190/geo2020-0159.1

Fang, K., Kifer, D., Lawson, K., & Shen, C. (2020). Evaluating the potential and challenges of an uncertainty quantification method

for long short-term memory models for soil moisture predictions. Water Resources Research, 56(12), e2020WR028095. https://doi.

org/10.1029/2020wr028095

Fang, K., Shen, C., Kifer, D., & Yang, X. (2017). Prolongation of SMAP to spatiotemporally seamless coverage of continental U.S. using a

deep learning neural network. Geophysical Research Letters, 44(21), 11030–11039. https://doi.org/10.1002/2017gl075619

Feng, D. P., Fang, K., & Shen, C. P. (2020). Enhancing streamflow forecast and extracting insights using long-short term memory networks

with data integration at continental scales. Water Resources Research, 56(9), e2019WR026793. https://doi.org/10.1029/2019wr026793

Feng, R., Mejer Hansen, T., Grana, D., & Balling, N. (2020). An unsupervised deep-learning method for porosity estimation based on post-

stack seismic data. Geophysics, 85(6), M97–M105. https://doi.org/10.1190/geo2020-0121.1

YU AND MA

10.1029/2021RG000742

32 of 36

Reviews of Geophysics

Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International

conference on machine learning. (pp. 1050–1059). PMLR. https://dl.acm.org/doi/10.5555/3045390.3045502

Gao, Z., Pan, Z., Gao, J., & Xu, Z. (2019). Building long-wavelength velocity for salt structure using stochastic full waveform inversion

with deep autoencoder based model reduction. In SEG technical program expanded abstracts. (pp. 1680–1684). Society of Exploration

Geophysicists. https://doi.org/10.1190/segam2019-3215572.1

Garofalo, F., Sauvin, G., Socco, L. V., & Lecomte, I. (2015). Joint inversion of seismic and electric data applied to 2D media. Geophysics,

80(4), EN93–EN104. https://doi.org/10.1190/geo2014-0313.1

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., etal., (2014). Generative adversarial networks. Advances in

Neural Information Processing Systems. 2672–2680. https://dl.acm.org/doi/10.5555/2969033.2969125

Grana, D., Azevedo, L., & Liu, M. (2020). A comparison of deep machine learning and Monte Carlo methods for facies classification from

seismic data. Geophysics, 85(4), WA41–WA52. https://doi.org/10.1190/geo2019-0405.1

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern

recognition. (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90

He, Y., Cao, J., Lu, Y., Gan, Y., & Lv, S. (2018). Shale seismic facies recognition technology based on sparse autoencoder. In International

Geophysical Conference. (pp. 1744–1748) Society of Exploration Geophysicists and Chinese Petroleum Society. https://doi.org/10.1190/

IGC2018-428

Helmy, T., Fatai, A., & Faisal, K. (2010). Hybrid computational models for the characterization of oil and gas reservoirs. Expert Systems with

Applications, 37(7), 5353–5363. https://doi.org/10.1016/j.eswa.2010.01.021

Herrmann, F. J., & Hennenfent, G. (2008). Non-parametric seismic data recovery with curvelet frames. Geophysical Journal International,

173(1), 233–248. https://doi.org/10.1111/j.1365-246x.2007.03698.x

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

https://doi.org/10.1162/neco.1997.9.8.1735

Hu, A., Carter, B., Currie, J., Norman, R., Wu, S., & Zhang, K. (2020). A deep neural network model of global topside electron temperature

using incoherent scatter radars and its application to GNSS radio occultation. Journal of Geophysical Research, 125(2), 1–17. https://doi.

org/10.1029/2019ja027263

Hu, L., Zheng, X., Duan, Y., Yan, X., Hu, Y., & Zhang, X. (2019). First-arrival picking with a U-Net convolutional network. Geophysics, 84(6),

U45–U57. https://doi.org/10.1190/geo2018-0688.1

Huang, K., You, J., Chen, K., Lai, H., & Don, A. (2006). Neural network for parameters determination and seismic pattern detection (pp.

2285–2289). SEG Technical Program Expanded Abstracts.

Iten, R., Metger, T., Wilming, H., Del Rio, L., & Renner, R. (2020). Discovering physical concepts with neural networks. Physical Review

Letters, 124(1), 010508. https://doi.org/10.1103/physrevlett.124.010508

Jia, Y., & Ma, J. (2017). What can machine learning do for seismic data processing? An interpolation application. Geophysics, 82(3), V163–

V177. https://doi.org/10.1190/geo2016-0300.1

Jiang, G. Q., Xu, J., & Wei, J. (2018). A deep learning algorithm of neural network for the parameterization of typhoon-ocean feedback in

typhoon forecast models. Geophysical Research Letters, 45(8), 3706–3716. https://doi.org/10.1002/2018gl077004

Jiang, K., Wang, Z., Yi, P., Wang, G., Lu, T., & Jiang, J. (2019). Edge-enhanced GAN for remote sensing image superresolution. IEEE Trans-

actions on Geoscience and Remote Sensing, 57(8), 5799–5812. https://doi.org/10.1109/tgrs.2019.2902431

Kadow, C., Hall, D. M., & Ulbrich, U. (2020). Artificial intelligence reconstructs missing climate information. Nature Geoscience, 13(6),

408–413. https://doi.org/10.1038/s41561-020-0582-5

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of

the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Lee, S., Ji, E. Y., Moon, Y. J., & Park, E. (2021). One-day forecasting of global TEC using a novel deep learning model. Space Weather, 19(1),

2020SW002600. https://doi.org/10.1029/2020sw002600

Lei, N., An, D., Guo, Y., Su, K., Liu, S., Luo, Z., et al. (2020). A geometric understanding of deep learning. Engineering, 6(3), 361–374.

https://doi.org/10.1016/j.eng.2019.09.010

Lempitsky, V., Vedaldi, A., & Ulyanov, D. (2018). Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern

recognition. (pp. 9446–9454). https://doi.org/10.1109/CVPR.2018.00984

Li, J., Bao, Q., Liu, Y., Wu, G., Wang, L., He, B., et al. (2019). Evaluation of FAMIL2 in simulating the climatology and seasonal-to-inter-

annual variability of tropical cyclone characteristics. Journal of Advances in Modeling Earth Systems, 11(4), 1117–1136. https://doi.

org/10.1029/2018ms001506

Li, L., Lin, Y., Zhang, X., Liang, H., Xiong, W., & Zhan, S. (2019).Convolutional recurrent neural networks based waveform classification

in seismic facies analysis. (pp. 2599–2603). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3215237.1

Li, S., Song, W., Fang, L., Chen, Y., Ghamisi, P., & Benediktsson, J. A. (2019). Deep learning for hyperspectral image classification: An over-

view. IEEE Transactions on Geoscience and Remote Sensing, 57(9), 6690–6709. https://doi.org/10.1109/tgrs.2019.2907932

Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Process-

ing Magazine, 37(3), 50–60. https://doi.org/10.1109/msp.2020.2975749

Li, T., Shen, H., Yuan, Q., Zhang, X., & Zhang, L. (2017). Estimating ground-level PM2.5 by fusing satellite and station observations: A

geo-intelligent deep learning approach. Geophysical Research Letters, 44(23), 11985–11993. https://doi.org/10.1002/2017gl075710

Li, Z., Meier, M. A., Hauksson, E., Zhan, Z., & Andrews, J. (2018). Machine learning seismic wave discrimination: Application to earth-

quake early warning. Geophysical Research Letters, 45(10), 4773–4779. https://doi.org/10.1029/2018gl077870

Liang, J., Ma, J., & Zhang, X. (2014). Seismic data restoration via data-driven tight frame. Geophysics, 79(3), V65–V74. https://doi.

org/10.1190/geo2013-0252.1

Lim, J. S. (2005). Reservoir properties determination using fuzzy logic and neural networks from well data in offshore Korea. Journal of

Petroleum Science and Engineering, 49(3–4), 182–192. https://doi.org/10.1016/j.petrol.2005.05.005

Ling, F., Boyd, D., Ge, Y., Foody, G. M., Li, X., Wang, L., etal. (2019). Measuring river wetted width from remotely sensed imagery at the sub-

pixel scale with a deep convolutional neural network. Water Resources Research, 55(7), 5631–5649. https://doi.org/10.1029/2018wr024136

Linville, L., Pankow, K., & Draelos, T. (2019). Deep learning models augment analyst decisions for event discrimination. Geophysical Re-

search Letters, 46(7), 3643–3651. https://doi.org/10.1029/2018gl081119

Liu, B., Li, X., & Zheng, G. (2019). Coastal inundation mapping from bitemporal and dual-polarization SAR imagery based on deep convo-

lutional neural networks. Journal of Geophysical Research: Oceans, 124(12), 9101–9113. https://doi.org/10.1029/2019jc015577

YU AND MA

10.1029/2021RG000742

33 of 36

Reviews of Geophysics

Liu, L., Zou, S., Yao, Y., & Wang, Z. (2020). Forecasting global ionospheric TEC using deep learning approach. Space Weather, 18(11),

e2020SW002501. https://doi.org/10.1029/2020sw002501

Liu, S. (2020). Multi-parameter full waveform inversions based on recurrent neural networks. (Dissertation for the master degree in science).

Harbin Institute of Technology (China).

Maggiori, E., Tarabalka, Y., Charpiat, G., & Alliez, P. (2017). Convolutional neural networks for large-scale remote-sensing image classifi-

cation. IEEE Transactions on Geoscience and Remote Sensing, 55(2), 645–657. https://doi.org/10.1109/tgrs.2016.2612821

Makhzani, A. (2018). Unsupervised representation learning with autoencoders. (Doctoral dissertation), University of Toronto (Canada).

Malfante, M., Dalla Mura, M., Mars, J. I., Metaxian, J. P., Macedo, O., & Inza, A. (2018). Automatic classification of volcano seismic signa-

tures. Journal of Geophysical Research: Solid Earth, 123(12), 10645–10658. https://doi.org/10.1029/2018jb015470

Mallat, S. (2012). Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10), 1331–1398. https://doi.

org/10.1002/cpa.21413

Mandelli, S., Borra, F., Lipari, V., Bestagini, P., Sarti, A., & Tubaro, S. (2018). Seismic data interpolation through convolutional autoencoder

(pp. 4101–4105). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2018-2995428.1

Manucharyan, G. E., Siegelman, L., & Klein, P. (2021). A deep learning approach to spatiotemporal sea surface height interpolation and

estimation of deep currents in geostrophic ocean turbulence. Journal of Advances in Modeling Earth Systems, 13(1), e2019MS001965.

https://doi.org/10.1029/2019ms001965

Mcmahan, H. B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. Y. (2017). Communication-efficient learning of deep networks from

decentralized data. In International conference on artificial intelligence and statistics. (pp. 1273–1282). PMLR.

Meier, M. A., Ross, Z. E., Ramachandran, A., Balakrishna, A., Nair, S., Kundzicz, P., etal. (2019). Reliable real-time seismic signal/noise dis-

crimination with machine learning. Journal of Geophysical Research: Solid Earth, 124(1), 788–800. https://doi.org/10.1029/2018jb016661

Mou, L., Ghamisi, P., & Zhu, X. X. (2017). Deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on

Geoscience and Remote Sensing, 55(7), 3639–3655. https://doi.org/10.1109/tgrs.2016.2636241

Mousavi, S. M., & Beroza, G. C. (2020a). A machine-learning approach for earthquake magnitude estimation. Geophysical Research Letters,

47(1), e2019GL085976. https://doi.org/10.1029/2019gl085976

Mousavi, S. M., & Beroza, G. C. (2020b). Bayesian-deep-learning estimation of earthquake location from single-station observations. IEEE

Transactions on Geoscience and Remote Sensing, 58(11), 8211–8224. https://doi.org/10.1109/tgrs.2020.2988770

Mousavi, S. M., Ellsworth, W. L., Zhu, W., Chuang, L. Y., & Beroza, G. C. (2020). Earthquake transformer—An attentive deep-learn-

ing model for simultaneous earthquake detection and phase picking. Nature Communications, 11(1), 1–12. https://doi.org/10.1038/

s41467-020-17591-w

Mousavi, S. M., Horton, S. P., Langston, C. A., & Samei, B. (2016). Seismic features and automatic discrimination of deep and shallow

induced-microearthquakes using neural network and logistic regression. Geophysical Journal International, 207(1), 29–46. https://doi.

org/10.1093/gji/ggw258

Mousavi, S. M., & Langston, C. A. (2016). Hybrid seismic denoising using higher-order statistics and improved wavelet block thresholding.

Bulletin of the Seismological Society of America, 106(4), 1380–1393. https://doi.org/10.1785/0120150345

Mousavi, S. M., & Langston, C. A. (2017). Automatic noise-removal/signal-removal based on general cross-validation thresholding in syn-

chrosqueezed domain and its application on earthquake data. Geophysics, 82(4), V211–V227. https://doi.org/10.1190/geo2016-0433.1

Mousavi, S. M., Langston, C. A., & Horton, S. P. (2016). Automatic microseismic denoising and onset detection using the synchrosqueezed

continuous wavelet transform. Geophysics, 81(4), V341–V355. https://doi.org/10.1190/geo2015-0598.1

Mousavi, S. M., Zhu, W., Ellsworth, W., & Beroza, G. (2019). Unsupervised clustering of seismic signals using deep convolutional autoen-

coders. IEEE Geoscience and Remote Sensing Letters, 16(11), 1693–1697. https://doi.org/10.1109/lgrs.2019.2909218

Mousavi, S. M., Zhu, W., Sheng, Y., & Beroza, G. C. (2019). CRED: A deep residual network of convolutional and recurrent units for earth-

quake signal detection. Scientific Reports, 9(1), 1–14. https://doi.org/10.1038/s41598-019-45748-1

Nazari Siahsar, M. A., Gholtashi, S., Kahoo, A. R., Chen, W., & Chen, Y. (2017). Data-driven multitask sparse dictionary learning for noise

attenuation of 3D seismic data. Geophysics, 82(6), V385–V396. https://doi.org/10.1190/geo2017-0084.1

Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., etal. (2020). What role does hydrological science play in

the age of machine learning? Water Resources Research, 57, e2020WR028091. https://doi.org/10.1029/2020WR028091

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In International conference on machine

learning. (pp. 689–696). https://dl.acm.org/doi/10.5555/3104482.3104569

Niu, Y., Wang, Y. D., Mostaghimi, P., Swietojanski, P., & Armstrong, R. T. (2020). An innovative application of generative adversarial net-

works for physically accurate rock images with an unprecedented field of view. Geophysical Research Letters, 47(23), e2020GL089029.

https://doi.org/10.1029/2020gl089029

Oquab, M., Bottou, L., Laptev, I., & Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural

networks. In IEEE conference on computer vision and pattern recognition. (pp. 1717–1724). https://doi.org/10.1109/CVPR.2014.222

Oropeza, V., & Sacchi, M. (2011). Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis.

Geophysics, 76(3), V25–V32. https://doi.org/10.1190/1.3552706

Ovcharenko, O., Kazei, V., Kalita, M., Peter, D., & Alkhalifah, T. (2019). Deep learning for low-frequency extrapolation from multioffset

seismic data. Geophysics, 84(6), R989–R1001. https://doi.org/10.1190/geo2018-0884.1

Park, M. J., & Sacchi, M. D. (2019). Automatic velocity analysis using convolutional neural network and transfer learning. Geophysics,

85(1), V33–V43. https://doi.org/10.1190/geo2018-0870.1

Payani, A., Fekri, F., Alregib, G., Mohandes, M., & Deriche, M. (2019). Compression of seismic signals via recurrent neural networks: Lossy

and lossless algorithms. (pp. 4082–4086). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3207380.1

Poulton, M. M. (2002). Neural networks as an intelligence amplification tool: A review of applications. Geophysics, 67(3), 979–993. https://

doi.org/10.1190/1.1484539

Qi, J., Zhang, B., Lyu, B., & Marfurt, K. (2020). Seismic attribute selection for machine-learning-based facies analysis. Geophysics, 85(2),

O17–O35. https://doi.org/10.1190/geo2019-0223.1

Qian, F., Yin, M., Liu, X., Wang, Y., Lu, C., & Hu, G. (2018). Unsupervised seismic facies analysis via deep convolutional autoencoders.

Geophysics, 83(3), A39–A43. https://doi.org/10.1190/geo2017-0524.1

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward

and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. https://doi.

org/10.1016/j.jcp.2018.10.045

Ramachandram, D., & Taylor, G. W. (2017). Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing

Magazine, 34(6), 96–108. https://doi.org/10.1109/msp.2017.2738401

YU AND MA

10.1029/2021RG000742

34 of 36

Reviews of Geophysics

Read, J. S., Jia, X., Willard, J., Appling, A. P., Zwart, J. A., Oliver, S. K., etal. (2019). Process-guided deep learning predictions of lake water

temperature. Water Resources Research, 55(11), 9173–9190. https://doi.org/10.1029/2019wr024922

Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., & Prabhat (2019). Deep learning and process understand-

ing for data-driven earth system science. Nature, 566(7743), 195–204. https://doi.org/10.1038/s41586-019-0912-1

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image com-

puting and computer assisted intervention (pp. 234–241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28

Ross, Z. E., Meier, M.-A., & Hauksson, E. (2018). P wave arrival picking and first-motion polarity determination with deep learning. Jour-

nal of Geophysical Research: Solid Earth, 123(6), 5120–5129. https://doi.org/10.1029/2017jb015251

Ross, Z. E., Yue, Y. S., Meier, M. A., Hauksson, E., & Heaton, T. H. (2019). Phaselink: A deep learning approach to seismic phase associa-

tion. Journal of Geophysical Research: Solid Earth, 124(1), 856–869. https://doi.org/10.1029/2018jb016674

Rubinstein, R., Zibulevsky, M., & Elad, M. (2010). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE

Transactions on Signal Processing, 58(3), 1553–1564. https://doi.org/10.1109/tsp.2009.2036477

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.

https://doi.org/10.1038/323533a0

Rüttgers, M., Lee, S., Jeon, S., & You, D. (2019). Prediction of a typhoon track using a generative adversarial network and satellite images.

Scientific Reports, 9(1), 1–15. https://doi.org/10.1038/s41598-019-42339-y

Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in Neural Information Processing Systems.

https://dl.acm.org/doi/10.5555/3294996.3295142

Scher, S., & Messori, G. (2021). Ensemble methods for neural network-based weather forecasts. Journal of Advances in Modeling Earth

Systems, 13(2), e2020MS002331. https://doi.org/10.1029/2020MS002331

Shahnas, M. H., & Pysklywec, R. N. (2020). Toward a unified model for the thermal state of the planetary mantle: Estimations from mean

field deep learning. Earth and Space Science, 7(7), e2019EA000881. https://doi.org/10.1029/2019ea000881

Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources

Research, 54(11), 8558–8593. https://doi.org/10.1029/2018wr022643

Shen, H., Li, T., Yuan, Q., & Zhang, L. (2018). Estimating regional ground-level PM2.5 directly from satellite top-of-atmosphere reflectance

using deep belief networks. Journal of Geophysical Research, 123(24), 13875–13886. https://doi.org/10.1029/2018jd028759

Siahkoohi, A., Louboutin, M., & Herrmann, F. J. (2019). The importance of transfer learning in seismic modeling and imaging. Geophysics,

84(6), A47–A52. https://doi.org/10.1190/geo2019-0056.1

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on

learning representations.

Spitz, S. (1991). Seismic trace interpolation in the f-x domain. Geophysics, 56(6), 785–794. https://doi.org/10.1190/1.1443096

Subedar, M., Krishnan, R., Meyer, P. L., Tickoo, O., & Huang, J. (2019). Uncertainty-aware audiovisual activity recognition using deep Bayes-

ian variational inference. In International conference on computer vision. (pp. 6300–6309). https://doi.org/10.1109/ICCV.2019.00640

Sun, A. Y., Scanlon, B. R., Save, H., & Rateb, A. (2020).Reconstruction of grace total water storage through automated machine learning.

Water Resources Research, 57, e2020WR028666. https://doi.org/10.1029/2020WR028666

Sun, A. Y., Scanlon, B. R., Zhang, Z., Walling, D., Bhanja, S. N., Mukherjee, A., & Zhong, Z. (2019). Combining physically based modeling

and deep learning for fusing grace satellite data: Can we learn from mismatch? Water Resources Research, 55(2), 1179–1195. https://doi.

org/10.1029/2018wr023333

Sun, B., & Alkhalifah, T. (2020). Ml-descent: An optimization algorithm for full-waveform inversion using machine learning. Geophysics,

85(6), R477–R492. https://doi.org/10.1190/geo2019-0641.1

Sun, J., Niu, Z., Innanen, K. A., Li, J., & Trad, D. O. (2020). A theory-guided deep-learning formulation and optimization of seismic wave-

form inversion. Geophysics, 85(2), R87–R99. https://doi.org/10.1190/geo2019-0138.1

Tang, G., Long, D., Behrangi, A., Wang, C., & Hong, Y. (2018). Exploring deep neural networks to retrieve rain and snow in high latitudes

using multisensor and reanalysis data. Water Resources Research, 54(10), 8253–8278. https://doi.org/10.1029/2018wr023830

Tasistro-Hart, A., Grayver, A., & Kuvshinov, A. (2020). Probabilistic geomagnetic storm forecasting via deep learning. Journal of Geophys-

ical Research: Space Physics, 126, e2020JA028228. https://doi.org/10.1029/2020JA028228

Titos, M., Bueno, A., García, L., Benítez, M. C., & Ibañez, J. (2019). Detection and classification of continuous volcano-seismic signals

with recurrent neural networks. IEEE Transactions on Geoscience and Remote Sensing, 57(4), 1936–1948. https://doi.org/10.1109/

tgrs.2018.2870202

Wang, B., Zhang, N., Lu, W., & Wang, J. (2019). Deep-learning-based seismic data interpolation: A preliminary result. Geophysics, 84(1),

V11–V20. https://doi.org/10.1190/geo2017-0495.1

Wang, J., Xiao, Z., Liu, C., Zhao, D., & Yao, Z. (2019). Deep learning for picking seismic arrival times. Journal of Geophysical Research: Solid

Earth, 124(7), 6612–6624. https://doi.org/10.1029/2019jb017536

Wang, J. L., Zhuang, H., Chérubin, L. M., Ibrahim, A. K., & Ali, A. M. (2019). Medium-term forecasting of loop current Eddy Cameron and

Eddy Darwin formation in the Gulf of Mexico with a divide-and-conquer machine learning approach. Journal of Geophysical Research,

124(8), 5586–5606. https://doi.org/10.1029/2019jc015172

Wang, N., Chang, H., & Zhang, D. (2020). Deep-learning-based inverse modeling approaches: A subsurface flow example. Journal of Geo-

physical Research: Solid Earth, 126, e2020JB020549. https://doi.org/10.1029/2020JB020549

Wang, T., Zhang, Z., & Li, Y. (2019). Earthquakegen: Earthquake generator using generative adversarial networks (pp. 2674–2678). SEG

Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3216687.1

Wang, W., & Ma, J. (2020). Velocity model building in a crosswell acquisition geometry with image-trained artificial neural network. Geo-

physics, 85(2), U31–U46. https://doi.org/10.1190/geo2018-0591.1

Wang, W., McMechan, G. A., & Ma, J. (2020). Elastic full-waveform inversion with recurrent neural networks. In SEG technical program

expanded abstracts (pp. 860–864). Society of Exploration Geophysicists. https://doi.org/10.1190/segam2020-3425921.1

Wang, W., McMechan, G. A., Ma, J., & Xie, F. (2021). Automatic velocity picking from semblances with a new deep-learning regression

strategy: Comparison with a classification approach. Geophysics, 86(2), U1–U13. https://doi.org/10.1190/geo2020-0423.1

Wang, Y., Ge, Q., Lu, W., & Yan, X. (2019). Seismic impedance inversion based on cycle-consistent generative adversarial network. (pp. 2498–

2502). SEG Technical Program Expanded Abstracts. https://doi.org/10.1190/segam2019-3203757.1

Wang, Y., Wang, B., Tu, N., & Geng, J. (2020). Seismic trace interpolation for irregularly spatial sampled data using convolutional autoen-

coder. Geophysics, 85(2), V119–V130. https://doi.org/10.1190/geo2018-0699.1

Wu, H., Zhang, B., Li, F., & Liu, N. (2019). Semiautomatic first-arrival picking of microseismic events by using the pixel-wise convolutional

image segmentation method. Geophysics, 84(3), V143–V155. https://doi.org/10.1190/geo2018-0389.1

YU AND MA

10.1029/2021RG000742

35 of 36

36 of 36

Reviews of Geophysics

YU AND MA

10.1029/2021RG000742

Wu, H., Zhang, B., Lin, T., Cao, D., & Lou, Y. (2019). Semiautomated seismic horizon interpretation using the encoder-decoder convolu-

tional neural network. Geophysics, 84(6), B403–B417. https://doi.org/10.1190/geo2018-0672.1

Wu, H., Zhang, B., Lin, T., Li, F., & Liu, N. (2019). White noise attenuation of seismic trace by integrating variational mode decomposition

with convolutional neural network. Geophysics, 84(5), V307–V317. https://doi.org/10.1190/geo2018-0635.1

Wu, X., Geng, Z., Shi, Y., Pham, N., Fomel, S., & Caumon, G. (2020). Building realistic structure models to train convolutional neural net-

works for seismic structural interpretation. Geophysics, 85(4), WA27–WA39. https://doi.org/10.1190/geo2019-0375.1

Wu, X., Liang, L., Shi, Y., & Fomel, S. (2019). Faultseg3d: Using synthetic data sets to train an end-to-end convolutional neural network for

3d seismic fault segmentation. Geophysics, 84(3), IM35–IM45. https://doi.org/10.1190/geo2018-0646.1

Wu, X., Shi, Y., Fomel, S., Liang, L., Zhang, Q., & Yusifov, A. Z. (2019). Faultnet3d: Predicting fault probabilities, strikes, and dips with a

single convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 57(11), 9138–9155. https://doi.org/10.1109/

tgrs.2019.2925003

Wu, Y., & McMechan, G. A. (2019). Parametric convolutional neural network-domain full-waveform inversion. Geophysics, 84(6), R881–

R896. https://doi.org/10.1190/geo2018-0224.1

Xiao, C., Deng, Y., & Wang, G. (2021). Deep-learning-based adjoint state method: Methodology and preliminary application to inverse

modelling. Water Resources Research, 57(2), e2020WR027400. https://doi.org/10.1029/2020wr027400

Yamaga, N., & Mitsui, Y. (2019). Machine learning approach to characterize the postseismic deformation of the 2011 Tohoku-Oki earth-

quake based on recurrent neural network. Geophysical Research Letters, 46(21), 11886–11892. https://doi.org/10.1029/2019gl084578

Yang, F., & Ma, J. (2019). Deep-learning inversion: A next-generation seismic velocity model building method. Geophysics, 84(4), R583–

R599. https://doi.org/10.1190/geo2018-0249.1

Yang, Q., Tao, D., Han, D., & Liang, J. (2019). Extracting auroral key local structures from all-sky auroral image by artificial intelligence

technique. Journal of Geophysical Research: Space Physics, 124(5), 3512–3521. https://doi.org/10.1029/2018ja026119

Yoo, D., & Kweon, I. S. (2019). Learning loss for active learning. In IEEE conference on computer vision and pattern recognition. (pp. 93–102).

https://doi.org/10.1109/CVPR.2019.00018

Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks. In Proceedings of the neural

information processing systems. (pp. 3320–3328). https://dl.acm.org/doi/10.5555/2969033.2969197

You, N., Li, Y. E., & Cheng, A. (2020). Shale anisotropy model building based on deep neural networks. Journal of Geophysical Research:

Solid Earth, 125(2), e2019JB019042. https://doi.org/10.1029/2019jb019042

Yu, S., Ma, J., & Osher, S. (2016). Monte Carlo data-driven tight frame for seismic data recovery. Geophysics, 81(4), V327–V340. https://doi.

org/10.1190/geo2015-0343.1

Yu, S., Ma, J., & Wang, W. (2019). Deep learning for denoising. Geophysics, 84(6), V333–V350. https://doi.org/10.1190/geo2018-0668.1

Yu, S., Ma, J., Zhang, X., & Sacchi, M. (2015). Interpolation and denoising of high-dimensional seismic data by learning a tight frame.

Geophysics, 80(5), V119–V132. https://doi.org/10.1190/geo2014-0396.1

Yuan, P., Wang, S., Hu, W., Wu, X., Chen, J., & Van Nguyen, H. (2020). A robust first-arrival picking workflow using convolutional and

recurrent neural networks. Geophysics, 85(5), U109–U119. https://doi.org/10.1190/geo2019-0437.1

Zhang, C., Frogner, C., Araya-Polo, M., & Hohl, D. (2014). Machine-learning based automated fault detection in seismic traces. In 76th

EAGE conference and exhibition. (pp.1–5). https://doi.org/10.3997/2214-4609.20141500

Zhang, H., Yang, X., & Ma, J. (2020). Can learning from natural image denoising be used for seismic data interpolation?. Geophysics, 85(4),

WA115–WA136. https://doi.org/10.1190/geo2019-0243.1

Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017). Beyond a Gaussian denoiser: Residual learning of deep CNN for image denois-

ing. IEEE Transactions on Image Processing, 26(7), 3142–3155. https://doi.org/10.1109/tip.2017.2662206

Zhang, K., Zuo, W., Gu, S., & Zhang, L. (2017). Learning deep CNN denoiser prior for image restoration. In IEEE conference on computer

vision and pattern recognition. (pp. 2808–2817).

Zhang, X., Zhang, J., Yuan, C., Liu, S., Chen, Z., & Li, W. (2020). Locating induced earthquakes with a network of seismic stations in Okla-

homa via a deep learning method. Scientific Reports, 10(1), 1941. https://doi.org/10.1038/s41598-020-58908-5

Zhang, Z., & Alkhalifah, T. (2019). Regularized elastic full-waveform inversion using deep learning. Geophysics, 84(5), R741–R751. https://

doi.org/10.1190/geo2018-0685.1

Zhang, Z., Stanev, E. V., & Grayek, S. (2020). Reconstruction of the basin-wide sea-level variability in the north sea using coastal data and gen-

erative adversarial networks. Journal of Geophysical Research: Oceans, 125(12), e2020JC016402. https://doi.org/10.1029/2020jc016402

Zhang, Z., Wang, H., Xu, F., & Jin, Y. (2017). Complex-valued convolutional neural network and its application in polarimetric SAR image

classification. IEEE Transactions on Geoscience and Remote Sensing, 55(12), 7177–7188. https://doi.org/10.1109/tgrs.2017.2743222

Zhao, M., Chen, S., Fang, L., & David, A. Y. (2019). Earthquake phase arrival auto-picking based on u-shaped convolutional neural net-

work. Chinese Journal of Geophysics, 62(8), 3034–3042. https://doi.org/10.6038/cjg2019M0495

Zhong, Y., Ye, R., Liu, T., Hu, Z., & Zhang, L. (2020). Automatic aurora image classification framework based on deep learning for occur-

rence distribution analysis: A case study of all-sky image datasets from the yellow river station. Journal of Geophysical Research, 125,

e2019JA027590. https://doi.org/10.1029/2019JA027590

Zhou, Y., Yue, H., Kong, Q., & Zhou, S. (2019). Hybrid event detection and phase-picking algorithm using convolutional and recurrent

neural networks. Seismological Research Letters, 90(3), 1079–1087. https://doi.org/10.1785/0220180319

Zhu, J., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Inter-

national conference on computer vision. (pp. 2242–2251). https://doi.org/10.1109/ICCV.2017.244

Zhu, W., Mousavi, S. M., & Beroza, G. C. (2019). Seismic signal denoising and decomposition using deep neural networks. IEEE Transac-

tions on Geoscience and Remote Sensing, 57(11), 9476–9488. https://doi.org/10.1109/tgrs.2019.2926772

Content uploaded by Jianwei Ma

Content may be subject to copyright.

Detecting DC Electrical Resistivity Changes in Seismic Active Areas: State-of-the-Art and Future Directions

Article

Full-text available

Apr 2024

Vincenzo Lapenna

In this paper, a critical review of the geoelectrical monitoring activities carried out in seismically active areas is presented and discussed. The electrical resistivity of rocks is one of the geophysical parameters of greatest interest in the study of possible seismic precursors, and it is strongly influenced by the presence of highly fractured zones with high permeability and fluid levels. The analysis in the present study was carried out on results obtained over the last 50 years in seismic zones in China, Japan, the USA and Russia. These past works made it possible to classify the different monitoring strategies, analyze the theoretical models for interpreting possible correlations between anomalies in resistivity signals and local seismicity, and identify the main scientific and technological gaps in the literature. In addition, great attention has been paid to some recent works on the study of the correlations between focal mechanisms and the shapes of anomalous patterns in resistivity time series. Finally, some future scenarios for the development of new activities in this field have been identified.

Improved impedance inversion by deep learning and iterated graph Laplacian

Preprint

Full-text available

Apr 2024

Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Laplacian and show its application in acoustic impedance inversion which is a routine procedure in seismic explorations. A neural network is used to obtain a first approximation of the underlying acoustic impedance and construct a graph Laplacian matrix from this approximation. Afterwards, we use a Tikhonov-like variational method to solve the impedance inversion problem where the regularizer is based on the constructed graph Laplacian. The obtained solution can be shown to be more accurate and stable with respect to noise than the initial guess obtained by the neural network. This process can be iterated several times, each time constructing a new graph Laplacian matrix from the most recent reconstruction. The method converges after only a few iterations returning a much more accurate reconstruction. We demonstrate the potential of our method on two different datasets and under various levels of noise. We use two different neural networks that have been introduced in previous works. The experiments show that our approach improves the reconstruction quality in the presence of noise.

Using explainable artificial intelligence (XAI) methods to understand the nonlinear relationship between the Three Gorges Dam and downstream flood

Article

Full-text available

Apr 2024

Study Region: In the Yangtze River basin of China. Study focus: The emerging Explainable Artificial Intelligence (XAI) methods provide us an opportunity to understand the nonlinear relationship that the Deep Learning(DL) model learned inside. The construction of the Three Gorges Dam (TGD) has successfully minimized the likelihood of flooding in the Yangtze River basin. The XAI methods can help us to know the nonlinear relationship behind it. We apply the Long Short Term Memory (LSTM) network, in conjunction with two XAI methods, SHapley Additive exPlanation (SHAP) and Expected Gradient (EG), to do our work.In our DL model, we use YiChang (YC) station runoff,Precipitation (Pre) and vapour pressure deficit (VPD) data from the middle and lower river basin as input, while the output of the model generates runoff data at the DaTong (DT) station, XAI methods enable us to calculate the significance of each input feature is for generating the output feature in a DL model. In this study, we examine the difference in importance scores between the Before Three Gorges Dam (BTGD) period and the After Three Gorges Dam (ATGD) period. New Hydrological Insights for the Region: In the BTGD period, YC runoff was the primary contributor to flooding at the DT station. However, in the ATGD period, the largest contribution to flooding in the middle and lower Yangtze River basin has shifted from YC runoff to the the middle and lower reaches of precipitation. Our results suggest that the XAI can show the nonlinear relationship between the TGD and downstream flood clearly and the TGD can effectively mitigate flooding in the middle and lower river basins by regulating runoff from the upper river basin. The work shows the potential of XAI to explain the nonlinear relationship in the hydrology field.

C-RIA: RESISTIVITY DATA INTERPRETATION AND ANALYSIS IN CLOUD MODE

Article

Full-text available

Apr 2024

Geophysical methods generally require one or several processes before interpretation is carried out. This process usually requires software that requires fast computer technology. The better the computer device, the faster data can be processed. We propose a new approach in processing and interpreting and integrating geophysical data, especially resistivity data, using cloud technology. This technology is generally able to increase the speed of processing and interpreting geophysical data, which really requires devices with fast capabilities. Not to mention, if there is a lot of data being processed, it will take a long time just to process the data. Therefore, by using cloud technology the work can be done efficiently because it uses computers with modern and fast technology. In this research we apply this technology to geophysical data that is most often used for shallow exploration, namely the resistivity geoelectric method. With this research, we hope that data processing and geophysical data inversion will be more efficient and effective and the data will be safer.

On the implementation of ADMM with dynamically configurable parameter for the separable ℓ1/ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}/\ell _{2}$$\end{document} minimization

Article

Full-text available

Mar 2024
OPTIM LETT

In this paper, we propose a novel variant of the alternating direction method of multipliers (ADMM) approach for solving minimization of the rate of ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} and ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{2}$$\end{document} norms for sparse recovery. We first transform the quotient of ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{1}$$\end{document} and ℓ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _{2}$$\end{document} norms into a new function of the separable variables using the least squares minimum norm solution of the linear system of equations. Subsequently, we employ the augmented Lagrangian function to formulate the corresponding ADMM method with a dynamically adjustable parameter. Additionally, each of its subproblems possesses a unique global minimum. Finally, we present some numerical experiments to demonstrate our results.

Seismic Stratigraphic Interpretation Based on Unsupervised Validation and Spectral Clustering Sampling

Article

Full-text available

Jan 2024

With the continuous development of computer technology and significant improvements in computing power, deep learning has found increasing applications in seismic stratigraphy interpretation, showcasing notable advancements over traditional methods. However, due to the unique characteristics of seismic data, labeling such data has become extremely challenging and time-consuming, necessitating the involvement of professional geologists. Consequently, few-shot learning has garnered considerable attention for seismic image segmentation.Nevertheless, two key challenges remain in few-shot learning: selecting more representative samples and validating the model during training. As the availability of labeled samples decreases, we are left with inadequate data for the validation set. In this paper, instead of solely focusing on enhancing the network structure, we propose the utilization of Spectral Clustering Sampling (SCS) methods for training data selection. Additionally, we introduce a metric called Sum of Different (SD), which can be computed without the need for labeled data, to replace the conventional validation set loss employed in traditional validation approaches. Notably, By employing SCS methods for training data selection and introducing the SD metric to replace traditional validation set loss in F3 dataset, we have achieved remarkable outcomes.

Estimation of Fracture Properties From Azimuthal Seismic Data Using Convolution Neural Network

Article

Apr 2024

Fracture weaknesses represent two critical elastic parameters utilized for characterizing fracture properties in naturally fractured reservoirs. Given the intricate seismic attributes associated with amplitude variation with angles of incidence and azimuth (AVAZ) in fractured reservoir, accurately delineating the mapping relationships between azimuthal seismic data and fracture weaknesses in analytic form poses a significant challenge. Leveraging neural networks offers a nonlinear mechanism to bridge this gap. Initially, we establish a forward model by employing convolution operations between the azimuthal PP-wave (incident and reflected P-wave) reflection coefficient equation in transversely isotropic (HTI) media with a horizontal axis of symmetry and seismic statistical wavelets. This foundation enables the synthesis of azimuth-dependent seismic data. Subsequently, a convolution neural network (CNN) is constructed to predict subsurface rock fracture properties from azimuthal pre-stack seismic data. To quantify the uncertainty associated with neural network estimation, we employ the approximate Bayesian computation (ABC) method to determine the posterior distribution of model parameters. Finally, we present the application of both synthetic and filed data. Our results indicate a correlation of 90% and 86.8% between the synthetic model and the blind well, respectively. Furthermore, the estimated posterior distribution serves to validate the constraint capability of the proposed method, thereby furnishing comprehensive evidence supporting the feasibility and robustness of our approach.

Multi-source precipitation estimation using machine learning: Clarification and benchmarking

Article

Apr 2024
J HYDROL

Joint Data and Model-Driven Simultaneous Inversion of Velocity and Density

Article

Full-text available

Mar 2024
GEOPHYS J INT

Density is an important parameter for both geological research and geophysical exploration. However, for model-driven seismic inversion methods, high-fidelity density inversion is challenging due to seismic wave travel-time insensitivity to density, and crosstalk that density has with velocity. To circumvent the challenge of density inversion, some inversion methods treat density as a constant value or derive density from velocity through empirical equation. On the other hand, deep learning approaches are completely driven by data and have strong target-oriented characteristics, offering a new way to solve multi-parameter coupling problems. Nevertheless, the accuracy of the inversion results of data-driven algorithms is directly related to the amount and diversity of the training data, and thus, they lack the universality of model-driven algorithms. To achieve accurate density inversion, we propose a simultaneous inversion algorithm for velocity and density that combines the advantages of data- and model- driven approaches: A neural network model (U-T), combining the U-net and Transformer architectures, is proposed to construct nonlinear mappings between seismic data as inputs and the velocity and density as predictions. Next, the model-driven inversion algorithm uses the U-T prediction as the initial model to obtain the final accurate solution. In the model-driven module, envelope-based sparse constrained deconvolution is used to obtain full-band seismic data, while a variable dominant frequency full waveform inversion algorithm is employed to perform multi-scale inversion, ultimately leading to accurate inversion results of velocity and density. The performance of the algorithm on the Sigsbee2A and Marmousi models demonstrates its effectiveness.

Detection of ULF Geomagnetic Anomalies Prior to the Tohoku-Oki Earthquake by the Multi-reference Station Method

Article

Jan 2024

The ultra-low-frequency (ULF) electromagnetic anomaly has been considered one of the earthquake precursory signals with the potential for short-term prediction. As such, effectively detecting ULF anomalies is important for mitigating earthquake disasters. Given the comprehensive coverage of geomagnetic networks in Japan, the 2011 Tohoku-Oki earthquake (M9.0) provided a tremendous opportunity for investigating the characteristics and mechanisms of ULF anomalies. Previous studies reporting detections of ULF anomalies triggered by the Tohoku-Oki earthquake have been questioned on the grounds of insufficient reliability due to technological limitations. In this study, we employ a multi-reference station data-quality-weighted method to detect the ULF anomaly. Comparison with traditional single-reference station method demonstrates the robustness of our detection results. Statistical test also indicates that the ULF anomaly appearing approximately two months before the earthquake was driven by physical processes. Geomagnetic storm analysis further rules solar activity as the cause of the anomaly. Spatial distribution of the anomaly amplitude reveals a decrease in the anomaly energy with increasing epicentral distance, implying a strong association between the anomaly and the earthquake. Our study is helpful for understanding the possible connection between the ULF electromagnetic precursors and the seismogenic processes of strong earthquakes, which contributes to achieving the ultimate goal of short-term earthquake prediction.

Deep‐Learning‐Based Adjoint State Method: Methodology and Preliminary Application to Inverse Modeling

Article

Full-text available

Feb 2021
WATER RESOUR RES

We present an efficient adjoint model based on the deep-learning surrogate to address high-dimensional inverse modelling with an application to subsurface transport. The proposed method provides a completely code non-intrusive and computationally feasible way to approximate the model derivatives, which subsequently can be used to derive gradients for inverse modelling. This conceptual deep-learning framework, i.e., an architecture of deep convolutional neural network through combining autoencoder and autoregressive structure, efficiently produces an analogously analytical adjoint with the help of auto-differentiation (AD) module in the popular deep-learning packages. We intentionally retain training data at the specific time instances where the measurements are taken, the storage of the intermediate states and computation of their adjoint, therefore, are completely avoided. This proposed adjoint state method is tested on a synthetic 2D model for parameter estimations. The preliminary results reveal the feasibility of the proposed adjoint state method in term of computational efficiency and programming flexibility.

Ensemble Methods for Neural Network‐Based Weather Forecasts

Article

Full-text available

Feb 2021

Abstract Ensemble weather forecasts enable a measure of uncertainty to be attached to each forecast, by computing the ensemble's spread. However, generating an ensemble with a good spread‐error relationship is far from trivial, and a wide range of approaches to achieve this have been explored—chiefly in the context of numerical weather prediction models. Here, we aim to transform a deterministic neural network weather forecasting system into an ensemble forecasting system. We test four methods to generate the ensemble: random initial perturbations, retraining of the neural network, use of random dropout in the network, and the creation of initial perturbations with singular vector decomposition. The latter method is widely used in numerical weather prediction models, but is yet to be tested on neural networks. The ensemble mean forecasts obtained from these four approaches all beat the unperturbed neural network forecasts, with the retraining method yielding the highest improvement. However, the skill of the neural network forecasts is systematically lower than that of state‐of‐the‐art numerical weather prediction models.

Probabilistic Geomagnetic Storm Forecasting via Deep Learning

Article

Full-text available

Jan 2021

Plain Language Summary Geomagnetic storms are capable of damaging infrastructures like power grids and communication lines, motivating our need to forecast them. Solar phenomena produce geomagnetic storms, which occur when these phenomena reach Earth as bursts of the solar wind. Decades of satellite observations of both the solar wind near the Earth and of the Sun itself are promising for forecasting geomagnetic storms with algorithms known as neural networks. Several neural network architectures have been applied to geomagnetic storm forecasting, but their full potential remains unexplored. First, all existing neural networks have used measurements of the solar wind one hour upstream of the Earth or closer. While these observations are critical for understanding geomagnetic storm progression, from them it is nearly impossible to forecast more than an hour in advance. We include observations of the Sun itself, which reach Earth much faster than the solar wind, thereby including information for forecasting further in advance. Second, all existing neural networks have generated forecasts without uncertainty estimates, meaning that end‐users (such as utilities or telecommunications companies) know little about forecast confidence. We present an architecture that generates estimates of uncertainty, and our results demonstrate that neural networks learn how confident to be in their forecasts.

Reconstruction of the Basin‐Wide Sea‐Level Variability in the North Sea Using Coastal Data and Generative Adversarial Networks

Article

Full-text available

Dec 2020

We present an application of generative adversarial networks (GANs) to reconstruct the sea level of the North Sea using a limited amount of data from tidal gauges (TGs). The application of this technique, which learns how to generate datasets with the same statistics as the training set, is explained in detail to ensure that interested scientists can implement it in similar or different oceanographic cases. Training is performed for all of 2016, and the model is validated on data from 3 months in 2017 and compared against reconstructions using the Kalman filter approach. Tests with datasets generated by an operational model (“true data”) demonstrated that using data from only 19 locations where TGs permanently operate is sufficient to generate an adequate reconstruction of the sea surface height (SSH) in the entire North Sea. The machine learning approach appeared successful when learning from different sources, which enabled us to feed the network with real observations from TGs and produce high‐quality reconstructions of the basin‐wide SSH. Individual reconstruction experiments using different combinations of training and target data during the training and validation process demonstrated similarities with data assimilation when errors in the data and model were not handled appropriately. The proposed method demonstrated good skill when analyzing both the full signal and the low‐frequency variability only. It was demonstrated that GANs are also skillful at learning and replicating processes with multiple time scales. The different skills in different areas of the North Sea are explained by the different signal‐to‐noise ratios associated with differences in regional dynamics.

Machine Learning for Model Error Inference and Correction

Article

Full-text available

Nov 2020

Abstract Model error is one of the main obstacles to improved accuracy and reliability in numerical weather prediction (NWP) and climate prediction conducted with state‐of‐the‐art, comprehensive high‐resolution general circulation models. In a data assimilation framework, recent advances in the context of weak‐constraint 4D‐Var have shown that it is possible to estimate and correct for a large fraction of systematic model error which develops in the stratosphere over short forecast ranges. The recent explosion of interest in machine learning/deep learning technologies has been driven by their remarkable success in disparate application areas. This raises the question of whether model error estimation and correction in operational NWP and climate prediction can also benefit from these techniques. In this work, we aim to start to give an answer to this question. Specifically, we show that artificial neural networks (ANNs) can reproduce the main results obtained with weak‐constraint 4D‐Var in the operational configuration of the IFS model of the European Centre for Medium‐Range Weather Forecasts (ECMWF). We show that the use of ANN models inside the weak‐constraint 4D‐Var framework has the potential to extend the applicability of the weak‐constraint methodology for model error correction to the whole atmospheric column. Finally, we discuss the potential and limitations of the machine learning/deep learning technologies in the core NWP tasks. In particular, we reconsider the fundamental constraints of a purely data‐driven approach to forecasting and provide a view on how to best integrate machine learning technologies within current data assimilation and forecasting methods.

Evaluating the Potential and Challenges of an Uncertainty Quantification Method for Long Short‐Term Memory Models for Soil Moisture Predictions

Article

Full-text available

Dec 2020
WATER RESOUR RES

Recently, recurrent deep networks have shown promise to harness newly available satellite‐sensed data for long‐term soil moisture projections. However, to be useful in forecasting, deep networks must also provide uncertainty estimates. Here we evaluated Monte Carlo dropout with an input‐dependent data noise term (MCD+N), an efficient uncertainty estimation framework originally developed in computer vision, for hydrologic time series predictions. MCD+N simultaneously estimates a heteroscedastic input‐dependent data noise term (a trained error model attributable to observational noise) and a network weight uncertainty term (attributable to insufficiently constrained model parameters). Although MCD+N has appealing features, many heuristic approximations were employed during its derivation, and rigorous evaluations and evidence of its asserted capability to detect dissimilarity were lacking. To address this, we provided an in‐depth evaluation of the scheme's potential and limitations. We showed that for reproducing soil moisture dynamics recorded by the Soil Moisture Active Passive (SMAP) mission, MCD+N indeed gave a good estimate of predictive error, provided that we tuned a hyperparameter and used a representative training data set. The input‐dependent term responded strongly to observational noise, while the model term clearly acted as a detector for physiographic dissimilarity from the training data, behaving as intended. However, when the training and test data were characteristically different, the input‐dependent term could be misled, undermining its reliability. Additionally, due to the data‐driven nature of the model, data noise also influences network weight uncertainty, and therefore the two uncertainty terms are correlated. Overall, this approach has promise, but care is needed to interpret the results.

One‐Day Forecasting of Global TEC Using a Novel Deep Learning Model

Article

Full-text available

Jan 2021
SPACE WEATHER

In this study, we make a global total electron content (TEC) forecasting using a novel deep learning method, which is based on conditional generative adversarial networks. For training, we use the International GNSS Service (IGS) TEC maps from 2003 to 2012 with 2‐h time cadence. Our model has two input images (IGS TEC map and 1‐day difference map between the present day and the previous day) and one output image (1‐day future map). The model is tested with two data sets: solar maximum period (2013–2014) and solar minimum period (2017–2018). Then, we compare the results of our model with those of 1‐day Center for Orbit Determination in Europe (CODE) prediction model. Our major results can be summarized as follows. First, we successfully apply our model to the forecast of global TEC maps. Second, our model well predicts daily TEC maps with 1 day in advance using only previous TEC maps. The averaged root mean square error, bias, and standard deviation between AI‐generated and IGS TEC maps are 2.74 TECU, −0.32 TECU, and 2.59 TECU, respectively. Third, our model generates some peak structures around equatorial regions. Fourth, our model shows better performance than 1‐day CODE prediction model during both solar maximum and minimum periods. Fifth, another model with additional input data Kp index gives a slight improvement of the results. Our study shows that our deep learning model based on an image translation method will be effective for forecasting of future images using previous data.

A Deep Learning approach to spatiotemporal SSH interpolation and estimation of deep currents in geostrophic ocean turbulence

Article

Full-text available

Jan 2021

Abstract Satellite altimeters provide global observations of sea surface height (SSH) and present a unique data set for advancing our theoretical understanding of upper‐ocean dynamics and monitoring its variability. Considering that mesoscale SSH patterns can evolve on timescales comparable to or shorter than satellite return periods, it is challenging to accurately reconstruct the continuous SSH evolution as currently available altimetry observations are still spatially and temporally sparse. Here we explore the possibility of SSH interpolation via Deep Learning by using synthetic observations from an idealized quasigeostrophic model of baroclinic ocean turbulence. We demonstrate that Convolutional Neural Networks with Residual Learning are superior in SSH reconstruction to linear and recently developed dynamical interpolation techniques. Also, the deep neural networks can provide a skillful state estimate of unobserved deep ocean currents at mesoscales. These conspicuous results suggest that SSH patterns of eddies might contain substantial information about the underlying deep ocean currents that are necessary for SSH prediction. Our training data are focused on highly idealized physics and diversification of processes needs to be considered to more accurately represent the real ocean. In addition, methodological improvements such as transfer learning and implementation of dynamically aware loss functions might be necessary to consider before its ultimate use with real satellite observations. Nonetheless, by providing a proof of concept based on synthetic data, our results point to deep learning as a viable alternative to existing interpolation and, more generally, state estimation methods for satellite observations of eddying currents.

U-Net: Convolutional Networks for Biomedical Image Segmentation

Book

Jan 2015

Deep‐Learning‐Based Inverse Modeling Approaches: A Subsurface Flow Example

Article

Dec 2020

Deep-learning has achieved good performance and demonstrated great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning-based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for problems with uncertain model parameters. By incorporating physical laws and other constraints, the TgNN surrogate can be constructed with limited simulation runs and accelerate the inversion process significantly. Three TgNN surrogate-based inversion methods are proposed, including the gradient method, the Iterative Ensemble Smoother method, and the training method. The second category is direct-deep-learning-inversion methods, in which TgNN constrained with geostatistical information, named TgNN-geo, is proposed as the deep-learning framework for direct inverse modeling. In TgNN-geo, two neural networks are introduced to approximate the random model parameters and the solution, respectively. In order to honor prior geostatistical information of the random model parameters, the neural network for approximating the random model parameters is first trained by using observed or generated realizations. Then, by minimizing the loss function of TgNN-geo, the estimation of model parameters and the approximation of the model solution can be simultaneously obtained. Since the prior geostatistical information can be incorporated, the direct-inversion method based on TgNN-geo works well, even in cases with sparse spatial measurements or imprecise prior statistics. Although the proposed deep-learning-based inverse modeling methods are general in nature, and thus applicable to a wide variety of problems, they are tested with several subsurface flow problems. It is found that satisfactory results are obtained with high efficiency. Moreover, both the advantages and disadvantages are further analyzed for the proposed two categories of deep-learning-based inversion methods.

Deep Learning for Geophysics: Current and Future Trends

Abstract and Figures

Recommended publications

Benchmark Machine Learning Approaches with Classical Time Series Approaches on the Blood Glucose Lev...

Future of DoD Geophysics Directorate up in the air

Monitoring of the In Salah CO2 Storage Site (Krechba) Using Microseismic Data Analysis

Using Futures Scenario Writing for Developing Deep Learning and Foresight with Preservice Science Ed...