# User:Ifarley/Full-waveform inversion, Part 2: Adjoint modeling

This tutorial originally appeared as a featured article in The Leading Edge in January 2018 — see issue. |

This is the second part of a three-part tutorial series on full-waveform inversion (FWI) in which we provide a step-by-step walk through of setting up forward and adjoint wave equation solvers and an optimization framework for inversion. In Part 1^{[1]}, we showed how to use Devito to set up and solve acoustic wave equations with (impulsive) seismic sources and sample wavefields at the receiver locations to forward model shot records. Here in Part 2, we will discuss how to set up and solve adjoint wave equations with Devito and, from that, how we can calculate gradients and function values of the FWI objective function.

## Contents

## Introduction

The gradient of FWI is most commonly computed via the adjoint-state method, by crosscorrelating forward and adjoint wavefields and summing the contributions over all time steps^{[2]}. Calculating the gradient for one source location consists of three steps:

- Solve the forward wave equation to create a shot record. The time varying wavefield must be stored for use in step 3; techniques such as subsampling can be used to reduce the storage requirements.
- Compute the data residual (or misfit) between the predicted and observed data.
- Solve the corresponding discrete adjoint model using the data residual as the source. Within the adjoint (reverse) time loop, crosscorrelate the second time derivative of the adjoint wavefield with the forward wavefield. These crosscorrelations are summed to form the gradient.

We start with the definition and derivation of the adjoint wave equation and its Devito stencil and then show how to compute the gradient of the conventional least-squares FWI misfit function. As usual, this tutorial is accompanied by all the code you need to reproduce the figures. Go to http://github.com/seg/tutorials-2018 and follow the links.

## A simple experiment

To demonstrate the gradient computation in the simplest possible way, we perform a small seismic transmission experiment with a circular imaging phantom, i.e., a constant velocity model with a circular high-velocity inclusion in its center, as shown in Figure 1. For a transmission experiment, we place 21 seismic sources on the left-hand side of the model and 101 receivers on the right-hand side.

We will use the forward propagator from Part 1 to independently model the 21 “observed” shot records using the true model. As the initial model for our gradient calculation, we use a constant velocity model with the same velocity as the true model, but without the circular velocity perturbation. We will then model the 21 predicted shot records for the initial model, calculate the data residual and gradient for each shot, and sum them to obtain the full gradient.

## The adjoint wave equation

Adjoint wave equations are a main component in seismic inversion algorithms and are required for computing gradients of both linear and nonlinear objective functions. To ensure stability of the adjoint modeling scheme and the expected convergence of inversion algorithms, it is important that the adjoint wave equation is in fact the adjoint (transpose) of the forward wave equation. The derivation of the adjoint wave equation in the acoustic case is simple, as it is self-adjoint if we ignore the absorbing boundaries for the moment. However, in the general case, discrete wave equations do not have this property (such as the coupled anisotropic TTI wave equation^{[3]}) and require correct derivations of their adjoints. We concentrate here, as in Part 1, on the acoustic case and follow an optimize-discretize approach, which means we write out the adjoint wave equation for the continuous case first and then discretize it, using finite-difference operators of the same order as for the forward equation. With the variables defined as in Part 1 and the data residual , located at (receiver locations) as the adjoint source, the continuous adjoint wave equation is given by:

The adjoint acoustic wave equation is equivalent to the forward equation with the exception of the damping term , which contains a first time derivative and therefore has a change of sign in its adjoint. (A second derivative matrix is the same as its transpose, whereas a first derivative matrix is equal to its negative transpose and vice versa.)

Following the pattern of Part 1, we first define the discrete adjoint wavefield **v** as a Devito `TimeFunction`

object. For reasons we will explain later, we do not need to save the adjoint wavefield:

v = TimeFunction (name="v", grid=model.grid, time_order=2, space_order=4, save=False)

Now symbolically set up the PDE:

pde = model.m * v.dt2 - vlaplace - model.damp * v.dt

As before, we then define a stencil:

stencil_v = Eq(v.backward, solve(pde, v.backware)[0])

Just as for the forward wave equation, `stencil_v`

defines the update for the adjoint wavefield of a single time step. The only difference is that, while the forward-modeling propagator goes forward in time, the adjoint propagator goes backward in time, since the initial time conditions for the forward propagator turn into final time conditions for the adjoint propagator. As for the forward stencil, we can write out the corresponding discrete expression for the update of the adjoint wavefield:

with dt being the time-stepping interval. Once again, this expression does not contain any (adjoint) source terms so far, which will be defined as a separate `SparseFunction`

object. Since the source term for the adjoint wave equation is the difference between an observed and modeled shot record, we first define an (empty) shot record `residual`

with 101 receivers and coordinates defined in `rec_coords`

. We then set the data field `rec.data`

of our shot record to be the data residual between the observed data `d_obs`

and the predicted data `d_pred`

. The symbolic residual source expression `res_term`

for our adjoint wave equation is then obtained by *injecting* the data residual into the modeling scheme (`residual.inject`

). Since we solve the time-stepping loop backward in time, the `res_term`

is used to update the previous adjoint wavefield `v.backward`

, rather than the next wavefield. As in the forward-modeling example, the source is scaled by dt2/**m**. In Python, we have:

residual = PointSource(name='residual', ntime=nt grid=model.grid, coordinates=rec_coords) res_term = residual.inject(field=v.backward, expr=residual * dt**2 / model.m, offset=model.nbpml)

In this demonstration, there is no real data. Instead we will generate the “observed” data via forward modeling with the true model `model`

. The synthetic data is generated from the initial model `model0`

. The resulting data, and their difference, are shown in Figure 2.

Finally, we create the full propagator by adding the residual source expression to our previously defined stencil and set the flag `time_axis=Backward`

, to specify that the propagator runs backward in time:

op_adj = Operator([stencil_v] + res_term, time_axis=Backward)

In contrast to forward modeling, we do not record any measurements at the surface since we are only interested in the adjoint wavefield itself. The full script for setting up the adjoint wave equation, including an animation of the adjoint wavefield is available in `adjoint_modeling.ipynb`

.

## Computing the FWI gradient

The goal of FWI is to estimate a discrete parametrization of the subsurface by minimizing the misfit between the observed shot records of a seismic survey and numerically modeled shot records. The predicted shot records are obtained by solving an individual wave equation per shot location and depend on the parametrization **m** of our wave propagator. The most common function for measuring the data misfit between the observed and modeled data is the l2 norm, which leads to the following objective function^{[4]} ^{[5]}:

where the index *i* runs over the total number of shots ns, and the model parameters are the squared slowness. Optimization problems of this form are called nonlinear least-squares problems, since the predicted data modeled with the forward-modeling propagator (`op_fwd()`

in Part 1) depends nonlinearly on the unknown parameters *m*. The full derivation of the FWI gradient using the adjoint-state method is outside the scope of this tutorial, but conceptually we obtain the gradient by applying the chain rule and taking the partial derivative of the inverse wave equation A(m)−1 with respect to m, which yields the following expression^{[2]} ^{[6]}:

The inner sum time = 1,…, *n*t runs over the number of computational time steps *n*t and denotes the second temporal derivative of the adjoint wavefield **v**. Computing the gradient of equation 3, therefore corresponds to performing the point-wise multiplication (denoted by the symbol ⊙) of the forward wavefields with the second time derivative of the adjoint wavefield and summing over all time steps.

To avoid the need to store the adjoint wavefield, the FWI gradient is calculated in the reverse time loop while solving the adjoint wave equation. To compute the gradient *g* for the current time step *v*[time]:

The second time derivative of the adjoint wavefield is computed with a second-order finite-difference stencil and uses the three adjoint wavefields that are kept in memory during the adjoint time loop (equation 2).

In Devito, we define the gradient as a `Function`

since the gradient is computed as the sum over all time steps and therefore has no time dependence:

grad = Function(name="grad", grid=model.grid)

The update for the gradient as defined in equations 4 and 5 is then:

grad_update = Eq(grad, grad - u * v.dt2)

Now we must add the gradient update expression to the adjoint propagator `op_grad`

. This yields a single symbolic expression with update instructions for both the adjoint wavefield and the gradient:

op_grad = Operator([stencil_v] + res_term + [grad_update], time_axis=Backward)

Solving the adjoint wave equation by running the following now computes the FWI gradient for a single source. Its value is stored in `grad.data`

.

op_grad(u=u0, v=v, m=mode10.m, residual=pred.data-obs.data, time=nt, dt=dt)

Now we can iterate over all the shot locations, running the same sequence of commands each time.

This gradient can then be used for a simple gradient descent optimization loop, as illustrated at the end of the notebook `adjoint_modeling.ipynb`

. After each update, a new gradient is computed for the new velocity model until sufficient decrease of the objective or chosen number of iteration is reached. A detailed treatment of optimization and more advanced algorithms will be described in the third and final part of this tutorial series.

To model seismic data by solving the acoustic wave equation, the first necessary step is to discretize this partial differential equation (PDE), which includes discrete representations of the velocity model and wavefields, as well as approximations of the spatial and temporal derivatives using finite-differences (FD). Unfortunately, implementing these finite-difference schemes in low-level code by hand is error prone, especially when we want performant and reliable code.
The primary design objective of Devito is to allow users to define complex matrix-free finite-difference approximations from high-level symbolic definitions, while employing automated code generation to create highly optimized low-level C code. Using the symbolic algebra package SymPy to facilitate the automatic creation of derivative expressions, Devito generates computationally efficient wave propagators.^{[7]}

At the core of Devito's symbolic API are symbolic types that behave like SymPy function objects, while also managing data:

`Function`

objects represent a spatially varying function discretized on a regular Cartesian grid. For example, a function symbol`f = Function(name='f', grid=model.grid, space_order=2)`

is denoted symbolically as`f(x, y)`

. The objects provide auto-generated symbolic expressions for finite-difference derivatives through shorthand expressions like`f.dx`

and`f.dx2`

for the first and second derivative in`x`

.`TimeFunction`

objects represent a time-dependent function that has time as the leading dimension, for example`g(time, x, y)`

. In addition to spatial derivatives`TimeFunction`

symbols also provide time derivatives`g.dt`

and`g.dt2`

.`SparseFunction`

objects represent sparse components, such as sources and receivers, which are usually distributed sparsely and often located off the computational grid — these objects also therefore handle interpolation onto the model grid.

To demonstrate Devito's symbolic capabilities, let us consider a time-dependent function **u**(time,*x,y*) representing the discrete forward wavefield:

u = TimeFunction(name="u", grid=model.grid, time_order=2, space_order=2, save=True, time_dim=nt)

where the grid object provided by the model defines the size of the allocated memory region, `time_order`

and `space_order`

define the default discretization order of the derived derivative expressions.

We can now use this symbolic representation of our wavefield to generate simple discretized expressions for finite-difference derivative approximations using shorthand expressions, such as `u.dt`

and `u.dt2`

to denote *du/dt* and *(d^2 u)/(dt^2 )* respectively:

>>> u.dt -u(time - dt, x, y)/(2*dt) + u(time + dt, x, y)/(2*dt) >>> u.dt2 -2*u(time, x, y)/dt**2 + u(time - dt, x, y)/dt**2 + u(time + dt, x, y)/dt**2

Using the automatic derivation of derivative expressions, we can now implement a discretized expression for Equation 1 without the source term *q(x,y,t;x_s,y_s)*. The
model object, which we created earlier, already contains the squared discrete slowness `model.m`

and damping term `model.damp`

as `Function`

objects:

pde = model.m * u.dt2 - u.laplace + model.damp * u.dt

If we write out the (second order) second time derivative `u.dt2`

as shown earlier and ignore the damping term for the moment, our `pde`

expression translates to the following discrete the wave equation:

**(**)

with time being the current time step and dt being the time stepping interval. To propagate the wavefield, we rearrange to obtain an expression for the wavefield **u**(time+dt) at the next time step. Ignoring the damping term once again, this yields:

**(**)

We can rearrange our `pde`

expression automatically using the SymPy utility function `solve`

, then create an expression which defines the update of the wavefield for the new time step **u**(time+dt), with the command `u.forward`

:

stencil = Eq(u.forward, solve(pde, u.forward)[0])

`stencil`

represents the finite-difference approximation derived from Equation 3, including the finite-difference approximation of the Laplacian and the damping term.

Although it defines the update for a single time step only, Devito knows that we will be solving a time-dependent problem over a number of time steps because the wavefield u is a `TimeFunction`

object.

## Conclusions

We need the gradient of the FWI objective function in order to find the optimal solution. It is computed by solving adjoint wave equations and summing the point-wise product of forward and adjoint wavefields over all time steps. Using Devito, the adjoint wave equation is set up in a similar fashion as the forward wave equation, with the main difference being the (adjoint) source, which is the residual between the observed and predicted shot records.

With the ability to model shot records and compute gradients of the FWI objective function, we are ready to demonstrate how to set up more gradient-based algorithms for FWI in Part 3 next month.

## Acknowledgments

This research was carried out as part of the SINBAD II project with the support of the member organizations of the SINBAD Consortium. This work was financially supported in part by EPSRC grant EP/L000407/1 and the Imperial College London Intel Parallel Computing Centre.

## Related links

Full-waveform inversion 1: forward modeling

## References

- ↑ Louboutin, M., Witte, P., Lange, M., Kukreja, N., Luporini, F., Gorman, G., and Herrmann, F., 2017, Full-waveform inversion, Part 1: Forward modeling:
*The Leading Edge*, 36, 1033–1036. http://dx.doi.org/10.1190/tle36121033.1 - ↑
^{2.0}^{2.1}Plessix, R. E., 2016, A review of the adjoint-state method for computing the gradient of a functional with geophysical applications:*Geophysical Journal International*, 167, 495–503. http://dx.doi.org/10.1111/j.1365-246X.2006.02978.x - ↑ Zhang, Y., Zhang, H., and Zhang, G., 2011, A stable TTI reverse time migration and its implementation:
*Geophysics*, 76, WA3–WA11. http://dx.doi.org/10.1190/1.3554411 - ↑ Lions, J. L., 1971, Optimal control of systems governed by partial differential equations: Springer-Verlag Berlin Heidelberg.
- ↑ Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation:
*Geophysics*, 49, 1259–1266. http://dx.doi.org/10.1190/1.1441754 - ↑ Virieux, J. and Operato S., 2009, An overview of full-waveform inversion in exploration geophysics:
*Geophysics*, 74, WCC1–WCC26. http://dx.doi.org/10.1190/1.3238367 - ↑ Meurer A, Smith CP, Paprocki M, et al., 2017, SymPy: symbolic computing in Python. PeerJ Computer Science 3:e103 http://dx.doi.org/10.7717/peerj-cs.103

## Corresponding authors

- Corresponding author: Mathias Louboutin, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia,
`mloubouteoas.ubc.ca`

- Philipp Witte, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia
- Michael Lange, Imperial College London, London, UK
- Navjot Kukreja, Imperial College London, London, UK
- Fabio Luporini, Imperial College London, London, UK
- Gerard Gorman, Imperial College London, London, UK
- Felix J. Herrmann, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, now at Georgia Institute of Technology, USA