Content uploaded by Zhaolun Liu

Author content

All content in this area was uploaded by Zhaolun Liu on Dec 17, 2019

Content may be subject to copyright.

Multilayer Sparse LSM=Deep Neural Network

Zhaolun Liu1and Gerard Schuster1

1King Abdullah University of Science and Technology (KAUST)

SUMMARY

We recast the multilayered sparse inversion problem as a

multilayered neural network problem. Unlike standard least

squares migration (LSM) which ﬁnds the optimal reﬂectivity

image, neural network least squares migration (NNLSM) ﬁnds

both the optimal reﬂectivity image and the quasi-migration

Green’s functions. These quasi-migration Green’s functions

are also denoted as the convolutional ﬁlters in a convolutional

neural network and are similar to migration Green’s functions.

The advantage of NNLSM over standard LSM is that its com-

putational cost is signiﬁcantly less and it can be used for de-

noising migration images. Its disadvantage is that the NNLSM

reﬂectivity image is only an approximation to the actual reﬂec-

tivity distribution.

INTRODUCTION

The biological processing of information in a cat’s brain can

be mathematically approximated as a weighted summation of

input values into a vertical layer of neurons, followed by a

thresholding operation (Hubel and Wiesel, 1962). Threshold-

ing simpliﬁes the amount of information to be processed by

eliminating unimportant input values that fall below a certain

threshold. This pair of operations, weighted summation and

thresholding of the input values into the ﬁrst layer of neurons,

leads to new inputs inserted into a second layer of neurons.

These new inputs are reweighted, summed, and thresholded

to inject into the second column of neurons. This process is

repeated again and again to form a multi-layered neural net-

work. Such networks are now used in many areas to automati-

cally classify and make decisions about large data sets (LeCun

et al., 2015).

Trial-and-error experimentation with neural networks over sev-

eral decades generated the architecture known as deep convo-

lutional neural networks (CNNs). The success of the current

CNN architectures is evidenced by their practical applications

with self-driving cars, image classiﬁcation and retrieval from

digital archives, and medical diagnosis from a combination of

body images generated by MRIs, CAT scans, and PET scans.

Until recently, the design of effective CNN architectures was

largely based on heuristic experimentation. This shortcoming

largely results from the absence of a rigorous mathematical

foundation for neural networks in general, and CNN in partic-

ular. Elad and his coauthors proposed that CNN problem could

be recast as ﬁnding the sparsest model munder the L1norm

subject to honoring the data misﬁt constraint ||Γ

Γ

Γm−mmig||2

2≤

β

(Papyan et al., 2017; Elad, 2018):

Given :Γ

Γ

Γ,mmig and Γ

Γ

Γm+n

n

no

o

oi

i

is

s

se

e

e=mmig,

Find :m∗=arg min

m||m||1

subject to ||Γ

Γ

Γm−mmig||2

2≤

β

(1)

where mis an N×1 input real-valued vector and Γ

Γ

Γis an N×N

matrix of real-valued weights. The scalar

β

is the speciﬁed

noise tolerance. The iterative solution to this problem is a

series of forward-modeling operations of a neural network,

where each layer consists of a concatenation of a weighted

summation of input values to give the vector zfollowed by a

two-sided soft thresholding operation denoted as

σ

(z)(Papyan

et al., 2017).

We now show that the sparse solution to the least squares mi-

gration problem reduces to the forward modeling operations of

a multi-layered neural network. Instead of just ﬁnding the opti-

mal m∗, we optimize for both the reﬂectivity mand the quasi-

migration Green’s functions Γ

Γ

Γ. These quasi-migration Green’s

functions are also denoted as the convolutional ﬁlters in a con-

volutional neural network approximate the role of migration

Green’s function (Schuster and Hu, 2000). The ﬁnal image is

denoted as the NNLSM estimate of the reﬂectivity distribution

that honors the L1sparsity condition. The next section shows

the connection between the multilayer neural network and the

solution to the multilayer LSM problem. This is followed by

the numerical examples with a synthetic salt model and ﬁeld

data from the North Sea.

THEORY OF NEURAL NETWORK LEAST SQUARES

MIGRATION

The theory of standard least squares migration is ﬁrst presented

to establish the benchmark solution under the L2norm. This

is then followed by the derivation of the sparse least squares

migration (SLSM) solution for a single-layer network. The

ﬁnal subsection derives the NNLSM solution for a multi-layer

network.

Least Squares Migration

The least squares migration problem in the image domain is

deﬁned (Schuster and Hu, 2000) as ﬁnding the reﬂectivity co-

efﬁcients miin the N×1 vector mthat minimize the L2objec-

tive function

ε

:

m∗=argmin

m{1

2||Γ

Γ

Γm−mmig||2

2},(2)

where Γ

Γ

Γ=LTLis the symmetric N×NHessian matrix, Lis

the forward modeling operator, and LTis the migration opera-

tor. mmig =LTdis the migration image computed by migrat-

ing the recorded data dwith the migration operator LT. The

Multilayer SLSM

kernel associated with the Hessian matrix LTLis also known

as the point scatterer response of the migration operator or the

migration Green’s function (Schuster and Hu, 2000).

The formal solution to equation 2 is

m∗=Γ

Γ

Γ−1mmig,(3)

where it is too expensive to directly compute the inverse Hes-

sian Γ

Γ

Γ−1. Instead, a gradient method gives the iterative solu-

tion

m(k+1)=m(k)−

α

Γ

Γ

ΓT(Γ

Γ

Γm(k)−mmig),(4)

where

α

is the step length, Γ

Γ

Γis symmetric, and m(k)is the

solution at the kth iteration.

Sparse Least Squares Migration

The sparse least squares problem (SLSM) is deﬁned as ﬁnd-

ing the reﬂectivity coefﬁcients miin the N×1 vector mthat

minimize the objective function

ε

(Perez et al., 2013):

ε

=1

2||Γ

Γ

Γm−mmig||2

2+

λ

S(m),(5)

where Γ

Γ

Γ=LTLrepresents the migration Green’s function (Schus-

ter and Hu, 2000), mmig =LTdis the migration image,

λ

>0

is a positive scalar, and S(m)is a sparseness function. For

example, the sparseness function might be S(m) = ||m||1or

S(m) = log(1+||m||2

2).

Single-Layer Sparse LSM

The solution to equation 5 is

m∗=argmin

m[1

2||Γ

Γ

Γm−mmig||2

2+

λ

S(m)],(6)

which can be approximated by an iterative gradient descent

method:

m(k+1)

i=m(k)

i−

α

[Γ

Γ

ΓT

r=residual

z}| {

(Γ

Γ

Γm−mmig) +

λ

S(m)′]i,

=m(k)

i−

α

[Γ

Γ

ΓTr+

λ

S(m)′]i.(7)

Here, S(m)′

iis the derivative of the sparseness function with re-

spect to the model parameter miand the step length is

α

. Vec-

tors and matrices are, respectively, denoted by boldface lower-

case and uppercase letters. When S(m) = ||m||1, the solution

in equation 7 can be recast as

m(k+1)

i=

soft([m(k)−1

α

Γ

Γ

ΓT(Γ

Γ

Γm(k)−mmig)]i,

λ

α

),(8)

where, soft is the 2-sided soft thresholding function (Elad, 2010).

Equation 8 is similar to the forward modeling procedure of the

one-layer neural network. That is, set k=0, m(0)=0, and let

the input vector be the residual vector r=−(Γ

Γ

Γm(0)−mmig) =

mmig so that the ﬁrst-iterate solution can be compactly repre-

sented by

m(1)=soft(Γ

Γ

ΓTmmig,

λ

),(9)

where

α

=1. Here, the input vector r=mmig is multiplied

by the matrix Γ

Γ

ΓTto give z=Γ

Γ

ΓTr, and zis then thresholded

and shrunk to give the output m=soft(z,

λ

). If we impose

a positivity constraint for zand a shrinkage constraint so

λ

is small, then the soft thresholding function becomes that of

a one-sided threshold function, also known as the Rectiﬁed

Linear Unit or ReLU function. To simplify the notation, the

soft(z,

λ

)function or ReLu(z)function is replaced by

σλ

(z)

so that equation 9 is given by

m(1)=

σλ

(Γ

Γ

ΓTmmig).(10)

For the ReLu function there is no shrinkage so

λ

=0.

We propose neural network LSM that ﬁnds both Γ

Γ

Γ∗and m∗

which minimizes equation 5 (Liu and Schuster, 2018). To ﬁnd

the solution we use the alternating steepest descent method that

alternates between ﬁnding Γ

Γ

Γ∗after, say 15, iterations and then

ﬁnding m∗after the same number of iterations. Γ

Γ

Γ∗is the ap-

proximate migration Green’s functions, also known as basis

functions and convolutional ﬁlters (Liu and Schuster, 2018).

Each ﬁlter is used to compute the feature map which corre-

sponds to sub-images of reﬂection coefﬁcients in the context of

LSM. As will be seen later, convolutional ﬁlters that appear to

be coherent noise can be excluded for denoising the migration

image. The advantage of this approach is that only inexpen-

sive matrix-vector multiplications are used and no expensive

solutions to the wave equation are needed for backward and

forward propagation of the waveﬁeld.

Multilayer Neural Network LSM

Similar to the derivation by Elad (2018) and Papyan et al.

(2017), the multilayer neural network LSM problem is deﬁned

as the following.

Find: mi,Γ

Γ

Γisuch that

m∗

1=arg min

m1,Γ

Γ

Γ1[1

2||Γ

Γ

Γ1m1−mmig||2

2+

λ

S(m1)],

m∗

2=arg min

m2,Γ

Γ

Γ2[1

2||Γ

Γ

Γ2m2−m∗

1||2

2+

λ

S(m2)],

.

.

.

m∗

N=arg min

mN,Γ

Γ

ΓN[1

2||Γ

Γ

ΓNmN−m∗

N−1||2

2+

λ

S(mN)],

(11)

where Γ

Γ

Γiis the ith Hessian matrix in the ith layer. The ﬁrst

iterate solution to the above system of equations can be cast in

a form similar to equation 10, except we have

m∗

N≈

σλ

(Γ

Γ

ΓT

N

σλ

(Γ

Γ

ΓT

N−1(...

σλ

(Γ

Γ

ΓT

1mmig)...),(12)

which is a repeated concatenation of the two operations of a

multilayered neural network: matrix-vector multiplication fol-

lowed by a thresholding operation. In all cases we use a con-

volutional neural network where different ﬁlters are applied to

the input from the previous layer to give feature maps associ-

ated with the next layer.

For a perfect prediction of the migration image, mmig can also

be approximated as mmig =Γ

Γ

Γ1Γ

Γ

Γ2. . . Γ

Γ

ΓNmN. We refer to Γ

Γ

Γ(i)as

the effective basis function at the ith level, i.e., Γ

Γ

Γ(i)=Γ

Γ

Γ1Γ

Γ

Γ2...Γ

Γ

Γi

so that

mmig =Γ

Γ

Γ(i)mi.(13)

Multilayer SLSM

NUMERICAL RESULTS

We provide numerical examples of multilayer NNLSM deﬁned

in equation 11 by using a migration image associated with

the 2D SEG/EAGE salt velocity model shown in Figure 1a.

The grid size of the model is 101 grid points in the z-direction

and 101 grid points in the x-direction. The grid interval is 40

m in the x direction and 20 m in the z direction. Figure 1b

shows the reverse time migration (RTM) image. The multi-

layer NNLSM consists of 3 convolutional layers: the ﬁrst one

contains 15 basis functions, i.e. ﬁlters, of size 11×11 grid

points, the second one consists 15 basis functions of dimen-

sions 11×11×15, and the last one contains contains 15 basis

function of dimensions 11×11×15. Equation 11 is solved for

both miand Γ

Γ

Γi(i∈1,2,3) by the two-step iterative procedure

denoted as the alternating descent method. The computed ef-

fective basis functions for these layers are shown in Figures 1c-

1e, where the yellow, red and green boxes indicate the size of

the effective basis functions, also known as quasi-migration

Green’s functions. It indicates that the basis functions of the

ﬁrst layer Γ

Γ

Γ1contains very simple small-dimensional edges,

which are called “atoms” by Elad (2018). The non-zeros of

the second basis functions Γ

Γ

Γ2combine a few atoms from Γ

Γ

Γ1

to create slightly more complex edges, junctions and corners

in the effective basis function Γ

Γ

Γ(2). Lastly, Γ

Γ

Γ3combines atoms

from Γ

Γ

Γ(2)in order to create more complex structure of the mi-

gration image. The corresponding stacked coefﬁcient images,

also known as feature maps, are shown in Figures 1f-1h, which

give the reﬂectivity distributions. The reconstructed migration

images are shown in Figures 1i-1k.

We apply the one-layer NNLSM method to ﬁeld data collected

in the North Sea (Schroot and Sc¨

uttenhelm, 2003). The time

migration image is shown in Figure 2a. The time axis is grid-

ded with 213 evenly-spaced points and there are 301 grid points

along the x-axis. Twenty-one 13-by-5 (grid point) convolu-

tional basis functions are estimated by the alternating NNLSM

procedure (see Figure 2b). These ﬁlters approximate dip-ﬁltered

migration Green’s functions, and the basis function is marked

as the yellow boxes in Figure 2a and 2b. The stacked coef-

ﬁcients (reﬂectivity distribution) are displayed in Figure 2c.

It is evident that the stacked coefﬁcients can provide a high-

resolution migration image. After reconstruction from the learned

basis functions and coefﬁcients, the migration image is shown

in Figure 2d with less noise.

SUMMARY

Neural network least squares migration ﬁnds the optimal re-

ﬂectivity distribution m(x)and quasi-migration Green’s func-

tions that minimize a sum of migration misﬁt and sparsity

functions. The advantages of NNLSM over standard LSM are

that its computational cost is signiﬁcantly less than that for

LSM and it can be used for ﬁltering both coherent and inco-

herent noise in migration images. Its disadvantage is that the

NNLSM reﬂectivity image is only an approximation to the ac-

tual reﬂectivity distribution.

To our knowledge, this paper is the ﬁrst example of connect-

ing the physics of seismic imaging with the mathematics of

sparse data representation and deep learning. For example,

1) quasi-migration Green’s functions=CNN ﬁlters=basis func-

tions, 2) reﬂectivity maps=CNN feature maps=weights for the

basis functions. This connection to LSM might be extended

to using CNN to inexpensively approximate the operations of

full waveform inversion.

Multilayer SLSM

Figure 1: (a) 2D SEG/EAGE salt model, (b) RTM image, (c)-(e) learned effective basis functions Γ

Γ

Γ(1),Γ

Γ

Γ(2)and Γ

Γ

Γ(3), (f)-(h) stacked

reﬂectivity coefﬁcients for m1,m2and m3, (i)-(k) reconstructed migration images Γ

Γ

Γ(1)m1,Γ

Γ

Γ(2)m2and Γ

Γ

Γ(3)m3.

Figure 2: a) Migration image computed from the F3 offshore block data, b) learned basis functions, c) stacked coefﬁcients and d)

migration image after ﬁltering.

Multilayer SLSM

REFERENCES

Elad, M., 2010, Sparse and redundant representations: from

theory to applications in signal and image processing:

Springer Science & Business Media.

——–, 2018, Sparse modeling in image processing and

deep learning: https://www.youtube.com/playlist?list

=PL0H3pMD88m8W39EbuArGLa9yXk8q6QGip. (Ac-

cessed: 2019-03-27).

Hubel, D. H., and T. N. Wiesel, 1962, Receptive ﬁelds, binocu-

lar interaction and functional architecture in the cat’s visual

cortex: The Journal of physiology, 160, 106–154.

LeCun, Y., Y. Bengio, and G. Hinton, 2015, Deep learning:

nature, 521, 436.

Liu, Z., and G. Schuster, 2018, Neural network least squares

migration: Presented at the First EAGE/SBGf Workshop on

Least-Squares Migration.

Papyan, V., Y. Romano, and M. Elad, 2017, Convolutional

neural networks analyzed via convolutional sparse coding:

The Journal of Machine Learning Research, 18, 2887–

2938.

Perez, D. O., D. R. Velis, and M. D. Sacchi, 2013, Estimating

sparse-spike attributes from ava data using a fast iterative

shrinkage-thresholding algorithm and least squares, in SEG

Technical Program Expanded Abstracts 2013: Society of

Exploration Geophysicists, 3062–3067.

Schroot, B., and R. Sc¨

uttenhelm, 2003, Expressions of shallow

gas in the Netherlands North Sea: Netherlands Journal of

Geosciences - Geologie en Mijnbouw, 82, 91–105.

Schuster, G. T., and J. Hu, 2000, Green’s function for mi-

gration: Continuous recording geometry: Geophysics, 65,

167–175.