Difference between revisions of "Full-waveform inversion, Part 1: Forward modeling"

From SEG Wiki
Jump to: navigation, search
(back to part 1)
Line 1: Line 1:
{{Tutorial|TLE|February 2017}}
+
{{Tutorial|TLE|December 2017}}
Full-waveform inversion, Part 3: Optimization
+
Since its re-introduction by Pratt (1999)<ref>Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model: GEOPHYSICS, 64, 888–901. {{doi|10.1190/1.1444597}}</ref>, [[full-waveform inversion|full-waveform inversion (FWI)]] has gained a lot of attention in geophysical exploration because of its ability to build high resolution velocity models more or less automatically in areas of complex [[geology]]. While there is an extensive and growing literature on the topic, publications focus mostly on technical aspects, making this topic inaccessible for a broader audience due to the lack of simple introductory resources for newcomers to [[geophysics]]. We will accomplish this by providing a hands-on walkthrough of FWI using Devito, a system based on domain-specific languages that automatically generates code for time-domain finite-differences.<ref>Lange, M., Kukreja, N., Louboutin, M., Luporini, F., Zacarias, F. V., Pandolfo, V., Gorman, G., 2016, Devito: Towards a generic finite difference DSL using symbolic python: 6th workshop on python for high-performance and scientific computing. {{doi|10.1109/PyHPC.2016.9}}</ref>
  
This tutorial is the third part of a full-waveform inversion (FWI) tutorial series with a step-by-step walkthrough of setting up forward and adjoint wave equations and building a basic FWI inversion framework. For discretizing and solving wave equations, we use Devito (http://www.opesci.org/devito-public), a Python-based domain-specific language for automated generation of finite-difference code (Lange et al., 2016). The first two parts of this tutorial (Louboutin et al., 2017, 2018) demonstrated how to solve the acoustic wave equation for modeling seismic shot records and how to compute the gradient of the FWI objective function using the adjoint-state method. With these two key ingredients, we will now build an inversion framework that can be used to minimize the FWI least-squares objective function.
+
As usual, this tutorial is accompanied by all the code you need to reproduce the figures. Go to http://github.com/seg/tutorials-2017 and follow the links. In the Notebook, we describe how to simulate synthetic data for a specified source and receiver setup and how to save the corresponding wavefields and shot records. In [[Full-waveform inversion, Part 2: Adjoint modeling|Part 2]] of this series, we will address how to calculate model updates, i.e. gradients of the FWI objective function, via adjoint modeling. Finally, in Part 3 we will demonstrate how to use this gradient as part of an optimization framework for inverting an unknown [[velocity model]].
  
 
== Introduction ==
 
== Introduction ==
FWI is a computationally and mathematically challenging problem. The computational complexity comes from the fact that an already expensive solution procedure for the wave equation needs to be repeated for a large number of source positions for each iteration of the optimization algorithm. The mathematical complexity comes from the fact that the FWI objective is known to have many local minima due to cycle skipping.
+
Devito provides a concise and straightforward computational framework for discretizing wave equations, which underlie all FWI frameworks. We will show that it generates verifiable executable code at run time for wave propagators associated with forward and (in [[Full-waveform inversion, Part 2: Adjoint modeling|Part 2]]) adjoint wave equations. Devito frees the user from the recurrent and time-consuming development of performant time-stepping codes and allows the user to concentrate on the geophysics of the problem rather than on low-level implementation details of wave-equation simulators. This tutorial covers the conventional adjoint-state formulation of full-waveform tomography<ref>Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: GEOPHYSICS, 49, 1259–1266. {{doi|10.1190/1.1441754}}</ref> that underlies most of the current methods referred to as full-waveform inversion.<ref>Virieux, J., and Operto, S., 2009, An overview of full-waveform inversion in exploration geophysics: GEOPHYSICS, 74, WCC1–WCC26. {{doi|10.1190/1.3238367}}</ref> While other formulations have been developed to improve the convergence of FWI for poor starting models, in these tutorials we will concentrate on the standard formulation that relies on the combination of a forward/adjoint pair of propagators and a correlation-based gradient. In part one of this tutorial, we discuss how to set up wave simulations for inversion, including how to express the wave equation in Devito symbolically and how to deal with the acquisition geometry.
  
This tutorial demonstrates how we can set up a basic FWI framework with two alternative gradient-based optimization algorithms: stochastic gradient descent and the Gauss–Newton method (Nocedal and Wright, 2009).
+
== What is FWI? ==
 +
FWI tries to iteratively minimize the difference between data that was acquired in a seismic survey and synthetic data that is generated from a wave simulator with an estimated (velocity) model of the subsurface. As such, each FWI framework essentially consists of a wave simulator for forward modeling the predicted data and an adjoint simulator for calculating a model update from the data misfit. This first part of this tutorial is dedicated to the forward modeling part and demonstrates how to discretize and implement the acoustic wave equation using [http://www.opesci.org/devito/ Devito].
  
We implement our inversion framework with the Julia Devito Inversion framework (JUDI) (https://github.com/slimgroup/JUDI.jl), a parallel software package for seismic modeling and inversion in the Julia programming language (Bezanson et al., 2012). JUDI provides abstractions and function wrappers that allow the implementation of wave-equation-based inversion problems such as FWI using code that closely follows the mathematical notation while using Devito's automatic code generation for solving the underlying wave equations.
+
== Wave simulations for inversion ==
 +
The acoustic wave equation with the squared slowness <math>m</math>, defined as <math>m(x,y)=c^{-2} (x,y)</math> with <math>c(x,y)</math> being the unknown spatially varying wavespeed, is given by:
  
The code to run the algorithms and generate the figures in this paper is available at http://github.com/seg/tutorials-2018.
+
{{NumBlk|:|<math>m\frac{d^{2}u(t,x,y)}{dt^{2}}-\Delta u(t,x,y)+\eta\frac{du(t,x,y)}{dt}=q(t,x,y;x_{s},y_{s})</math>|{{EquationRef|1}}}}
  
== Optimizing the FWI objective function ==
+
where <math>\Delta</math> is the Laplace operator, <math>q(t,x,y;x_s,y_s)</math> is the seismic source, located at <math>(x_s,y_s)</math> and <math>\eta(x,y)</math> is a space-dependent dampening parameter for the absorbing boundary layer.<ref>Cerjan, C., Kosloff, D., Kosloff, R., and Reshef, M., 1985, A nonreflecting boundary condition for discrete acoustic and elastic wave equations: GEOPHYSICS, 50, 705–708. {{doi|10.1190/1.1441945}}</ref> As shown in Figure 1, the physical model is extended in every direction by <code>nbpml</code> grid points to mimic an infinite domain. The dampening term <math>\eta du/dt</math> attenuates the waves in the dampening layer and prevents waves from reflecting at the model boundaries. In Devito, the discrete representations of <math>m</math> and <math>\eta</math> are contained in a <code>model</code> object that contains a grid object with all relevant information such as the origin of the coordinate system, grid spacing, size of the model and dimensions <code>time, x, y</code>:
The goal of this tutorial series is to optimize the FWI objective function with the ℓ2 misfit:
 
  
((1))
+
<pre>
where dipred and diobs are the predicted and observed seismic shot records of the ith source location, and m is the velocity model (expressed as squared slowness). In Part 1, we demonstrated how to implement a forward modeling operator to generate the predicted shot records, which we will denote as dipred = F(m; qi). In Part 2, we showed how we can compute the gradient ∇f(m) of the objective function and update our initial model using gradient descent.
+
model = Model(vp=v,             # A velocity model.
There is a snag, however. This first-order optimization algorithm has a linear convergence rate at best and typically requires many iterations to converge. Second-order optimization methods converge considerably faster. To implement them, we first approximate the objective with a second-order Taylor expansion:
+
              origin=(0, 0),    # Top left corner.
 +
              shape=(101, 101), # Number of grid points.
 +
              spacing=(10, 10), # Grid spacing in m.
 +
              nbpml=40)         # boundary layer.
 +
</pre>
  
((2))
+
[[File:Tle36121033.1 fig1.gif|thumb|center|500px|Figure 1: (a) Diagram showing the model domain, with the perfectly matched layer (PML) as an absorbing layer to attenuate the wavefield at the model boundary. (b) The example model used in this tutorial, with the source and receivers indicated. The grid lines show the cell boundaries.]]
where ϑ(δm3) represents the error term, ∇f(m0) is the gradient as implemented in Part 2, and ∇2f(m0) is the Hessian of the objective function, which we will refer to as H. Rather than using the negative gradient to incrementally update our model, as in gradient descent, we directly calculate a model update δm that leads us to the minimum. This is called Newton's method:
 
((3))
 
Although the method converges to the minimum of the FWI objective function quickly, it comes at the cost of having to compute and invert the Hessian matrix (Nocedal and Wright, 2009). Fortunately, for least-squares problems, such as FWI, the Hessian can be approximated by the Gauss-Newton (GN) Hessian JT J, where J is the Jacobian matrix. This is the partial derivative of the forward modeling operator F(m; q) with respect to m — something we can easily compute. Furthermore, the Jacobian can also be used to express the gradient of the FWI objective function as ∇f(m0) = JT (dipred − diobs), where JT is the adjoint (transposed) Jacobian. This is useful, because we now have a set of operators F, J and HGN = JT J, through which we can express both first- and second-order optimization algorithms for FWI.
 
Although forming these matrices explicitly is not possible, since they can become extremely large, we only need the action of these operators on vectors. This allows us to implement these operators matrix-free. In the following section, we will demonstrate how to set up these operators in our JUDI software framework and to how to use them to implement FWI algorithms.
 
  
== Implementing FWI in JUDI ==
+
In the <code>Model</code> instantiation, <code>vp</code> is the velocity in km/s, <code>origin</code> is the origin of the physical model in meters, <code>spacing</code> is the discrete grid spacing in meters, <code>shape</code> is the number of grid points in each dimension and nbpml is the number of grid points in the absorbing boundary layer. Is is important to note that shape is the size of the physical domain only, while the total number of grid points, including the absorbing boundary layer, will be automatically derived from <code>shape</code> and <code>nbpml</code>.
We start our demonstration by reading our data set, which consists of 16 shot records and was generated with an excerpt from the SEG/EAGE Overthrust model (Aminzadeh et al., 1997). We store it as a judiVector:
 
  
 +
== Symbolic definition of the wave propagator ==
 +
To model seismic data by solving the acoustic wave equation, the first necessary step is to discretize this partial differential equation (PDE), which includes discrete representations of the velocity model and wavefields, as well as approximations of the spatial and temporal derivatives using finite-differences (FD). Unfortunately, implementing these finite-difference schemes in low-level code by hand is error prone, especially when we want performant and reliable code.
 +
The primary design objective of Devito is to allow users to define complex matrix-free finite-difference approximations from high-level symbolic definitions, while employing automated code generation to create highly optimized low-level C code. Using the symbolic algebra package SymPy to facilitate the automatic creation of derivative expressions, Devito generates computationally efficient wave propagators.<ref>Meurer A, Smith CP, Paprocki M, et al., 2017, SymPy: symbolic computing in Python. PeerJ Computer Science 3:e103 {{doi|10.7717/peerj-cs.103}}</ref>
  
JUDI vectors such as d_obs can be used like regular Julia vectors, so we can compute norms via norm(d_obs) or the inner product via dot(d_obs, d_obs), but they contain the shot records in their original dimension. Shot records can be accessed via their respective shot number with d_obs.data[shot_no], while the header information can be accessed with d_obs.geometry. We extract the source geometry from our SEG-Y file and then manually set up a source vector q with an 8 Hz Ricker wavelet:
+
At the core of Devito's symbolic API are symbolic types that behave like SymPy function objects, while also managing data:
 +
* <code>Function</code> objects represent a spatially varying function discretized on a regular Cartesian grid. For example, a function symbol <code>f = Function(name='f', grid=model.grid, space_order=2)</code> is denoted symbolically as <code>f(x, y)</code>. The objects provide auto-generated symbolic expressions for finite-difference derivatives through shorthand expressions like <code>f.dx</code> and <code>f.dx2</code> for the first and second derivative in <code>x</code>.
 +
* <code>TimeFunction</code> objects represent a time-dependent function that has time as the leading dimension, for example <code>g(time, x, y)</code>. In addition to spatial derivatives <code>TimeFunction</code> symbols also provide time derivatives <code>g.dt</code> and <code>g.dt2</code>.
 +
* <code>SparseFunction</code> objects represent sparse components, such as sources and receivers, which are usually distributed sparsely and often located off the computational grid — these objects also therefore handle interpolation onto the model grid.
  
 +
To demonstrate Devito's symbolic capabilities, let us consider a time-dependent function '''u'''(time,''x,y'') representing the discrete forward wavefield:
  
We will now set up the forward modeling operator F(m; q) as a matrix-free operator for the inverse wave equation A(m)−1, where m is the current model, and source/receiver injection and sampling operators Ps and Pr.
+
<pre>
 +
u = TimeFunction(name="u", grid=model.grid,  
 +
                time_order=2, space_order=2,
 +
                save=True, time_dim=nt)
 +
</pre>
  
Since the dimensions of the inverse wave equation operator depend on the number of computational time steps, we calculate this number using the get_computational_nt function and set up an info object that contains some dimensionality information required by all operators.
+
where the grid object provided by the model defines the size of the allocated memory region, <code>time_order</code> and <code>space_order</code> define the default discretization order of the derived derivative expressions.
  
Then we can define Pr and Ps as matrix-free operators implementing Devito sparse point injection and interpolation (Louboutin et al., 2017). Multiplications with Ps and Pr represent sampling the wavefield at source/receiver locations, while their adjoints Ps′, Pr′ denote injecting either source wavelets or shot records into the computational grid.
+
We can now use this symbolic representation of our wavefield to generate simple discretized expressions for finite-difference derivative approximations using shorthand expressions, such as <code>u.dt</code> and <code>u.dt2</code> to denote ''du/dt'' and ''(d^2 u)/(dt^2 )'' respectively:
  
These projection and modeling operators are set up in Julia in the following way:
+
<pre>
 +
>>> u.dt
 +
-u(time - dt, x, y)/(2*dt) + u(time + dt, x, y)/(2*dt)
 +
>>> u.dt2
 +
-2*u(time, x, y)/dt**2 + u(time - dt, x, y)/dt**2 + u(time + dt, x, y)/dt**2
 +
</pre>
  
 +
Using the automatic derivation of derivative expressions, we can now implement a discretized expression for Equation 1 without the source term ''q(x,y,t;x_s,y_s)''. The
 +
model object, which we created earlier, already contains the squared discrete slowness <code>model.m</code> and damping term <code>model.damp</code> as <code>Function</code> objects:
  
The forward modeling step can be expressed mathematically as
+
<pre>
 +
pde = model.m * u.dt2 - u.laplace + model.damp * u.dt
 +
</pre>
  
((4))
+
If we write out the (second order) second time derivative <code>u.dt2</code> as shown earlier and ignore the damping term for the moment, our <code>pde</code> expression translates to the following discrete the wave equation:
which is expressed in Julia as
 
  
This forward models all 16 predicted shot records in parallel. Notice that, in instantiating Ainv, we made the wave equation solver implicitly dependent on model0.
+
{{NumBlk|:|<math>\frac{\mathbf{m}}{dt^{2}}(\mathbf{u}[time-dt]-2\mathbf{u}[time]+\mathbf{u}[time+dt])-\Delta \mathbf{u}[time]=0, time=1\cdots n_{t-1}</math>|{{EquationRef|2}}}}
  
Finally, we set up the matrix-free Jacobian operator J and the Gauss–Newton Hessian J′ ∗ J. As mentioned in the introduction, J is the partial derivative of the forward modeling operator F(m; q) with respect to the model m and is therefore directly constructed from our modeling operator Pr ∗ Ainv ∗ Ps′ and a specified source vector q:
+
with time being the current time step and dt being the time stepping interval. To propagate the wavefield, we rearrange to obtain an expression for the wavefield '''u'''(time+dt) at the next time step. Ignoring the damping term once again, this yields:
  
 +
{{NumBlk|:|<math>\mathbf{u}[time+dt]=2\mathbf{u}[time]-\mathbf{u}[time-dt])+\frac{dt^{2}}{\mathbf{m}}\Delta \mathbf{u}[time]</math>|{{EquationRef|3}}}}
  
In the context of seismic inversion, the Jacobian is also called the linearized modeling or demigration operator, and its adjoint J′ is the migration operator. One drawback of this notation is that the forward wavefields for the gradient calculation have to be recomputed since the forward modeling operator only returns the shot records and not the complete wavefields. For this reason, JUDI has an additional function for computing the gradients of the FWI objective function f,g = fwi_objective(model0, q[i], d_obs[i]), which takes the current model, source and data vectors as an input and computes the objective value and gradient in parallel without having to recompute the forward wavefields.
+
We can rearrange our <code>pde</code> expression automatically using the SymPy utility function <code>solve</code>, then create an expression which defines the update of the wavefield for the new time step '''u'''(time+dt), with the command <code>u.forward</code>:
  
== FWI via gradient descent ==
+
<pre>
With expressions for modeling operators, Jacobians and gradients of the FWI objective, we can now implement different FWI algorithms in a few lines of code. We will start with a basic gradient descent example with a line search. To reduce the computational cost of full gradient descent, we will use a stochastic approach in which we only compute the gradient and function value for a randomized subset of source locations. In JUDI, this is accomplished by choosing a random vector of integers between 1 and 16 and indexing the data vectors as described earlier. Furthermore, we will apply a projection operator proj(x), which prevent velocities (or squared slownesses) becoming negative or too large by clipping values outside the allowed range.
+
stencil = Eq(u.forward, solve(pde, u.forward)[0])
 +
</pre>
  
A few extra variables are defined in the notebook, but the full algorithm for FWI with stochastic gradient descent and box constraints is implemented as follows:
+
<code>stencil</code> represents the finite-difference approximation derived from Equation 3, including the finite-difference approximation of the Laplacian and the damping term.
  
 +
Although it defines the update for a single time step only, Devito knows that we will be solving a time-dependent problem over a number of time steps because the wavefield u is a <code>TimeFunction</code> object.
  
JUDI's backtracking_linesearch function performs an approximate line search and returns a model update that leads to a decrease of the objective function value (Armijo condition; Nocedal and Wright, 2009). The result after 10 iterations of SGD with box constraints is shown in Figure 2. In practice, where starting models are typically less accurate than in our example, FWI is often performed from low to high frequencies, since the objective function has less local minima for lower frequencies (Bunks et al., 1995). In this multiscale FWI approach, a low-pass-filtered version of the data is used to invert for a low-resolution velocity model first, and higher frequencies are added in subsequent iterations.
+
== Setting up the acquisition geometry ==
 +
The expression for time stepping we derived in the previous section does not contain a seismic source function yet, so the update for the wavefield at a new time step is solely defined by the two previous wavefields. However as indicated in Equation 1, wavefields for seismic experiments are often excited by an active (impulsive) source ''q(x,y,t;x_s)'', which is a function of space and time (just like the wavefield '''u'''). To include such a source term in our modeling scheme, we simply add the the source wavefield as an additional term to Equation 3:
  
[[File:Tle37020142.1 fig1.gif|thumb|500px|Figure 1. Observed shot record number 8.]]
+
{{NumBlk|:|<math>\mathbf{u}[time+dt]=2\mathbf{u}[time]-\mathbf{u}[time-dt])+\frac{dt^{2}}{\mathbf{m}}(\Delta \mathbf{u}[time]+\mathbf{q}[time])</math>|{{EquationRef|4}}}}
[[File:Tle37020142.1 fig2.gif|thumb|500px|Figure 2. (a) Initial model. (b) Recovered velocity model after 10 iterations of stochastic gradient descent with box constraints and a batch size of eight shots. (c) Recovered velocity model after 10 iterations of the Gauss–Newton method, with six iterations of LSQR for the Gauss–Newton subproblem, and using all shots in every iteration. The resulting misfit is substantially lower than what was achieved with gradient descent.]]
 
  
== FWI via the Gauss–Newton method ==
+
Because the source appears on the right-hand side in the original equation (Equation 1), the term also needs to be multiplied with dt^2/'''m''' (this follows from rearranging Equation 2, with the source on the right-hand side in place of 0). Unlike the discrete wavefield '''u''' however, the source '''q''' is typically localized in space and only a function of time, which means the time-dependent source wavelet is injected into the propagating wavefield at a specified source location. The same applies when we sample the wavefield at receiver locations to simulate a shot record, i.e. the simulated wavefield needs to be sampled at specified receiver locations only. Source and receiver both do not necessarily coincide with the modeling grid.
As discussed earlier, the convergence rate of GD depends on the objective function, but requires many FWI iterations necessary to reach an acceptable solution. Using our matrix-free operator for the Jacobian J, we can modify the above code to implement the Gauss–Newton method (equation 3) to improve the convergence rate. In practice, directly inverting the Gauss–Newton Hessian J′ ∗ J should be avoided, because the matrix is badly conditioned and takes many iterations to invert. Instead, we perform a few iterations of a least-squares solver, lsqr(), to approximately solve J ∗ p = d_pred - d_obs and obtain the update direction p. lsqr, from the Julia IterativeSolvers package, is a conjugate-gradient type algorithm for solving least-squares problems and is mathematically equivalent to inverting J′ ∗ J, but has better numerical properties (Paige and Saunders, 1982). We implement the Gauss-Newton method as follows:
 
  
In contrast to our SGD algorithm, we use all shot records in every iteration, since stochastic methods for second-order algorithms are less well understood, making this approach considerably more expensive than our previous algorithm. However, as shown in Figures 2 and 3, it achieves a superior result, with a considerably lower misfit compared to the known model. Furthermore, Figure 3 shows that it achieves the improved result in relatively few iterations.
+
Here, <code>RickerSource</code> acts as a wrapper around <code>SparseFunction</code> and models a Ricker wavelet with a peak frequency <code>f0</code> and source coordinates <code>src_coords</code>:
 +
<pre>
 +
f0 = 0.010  # kHz, peak frequency.
 +
src = RickerSource(name='src', grid=model.grid, f0=f0,
 +
                  time=time, coordinates=src_coords)
 +
</pre>
 +
The <code>src.inject</code> function now injects the current time sample of the Ricker wavelet (weighted with dt^2/'''m''' as shown in Equation 4) into the updated wavefield <code>u.forward</code> at the specified coordinates.
  
[[File:Tle37020142.1 fig3.gif|thumb|500px|Figure 3. Normalized function values for the FWI inversion example with stochastic gradient descent and the Gauss-Newton method.]]
+
<pre>
 +
src_term = src.inject(field=u.forward,
 +
                      expr=src * dt**2 / model.m,
 +
                      offset=model.nbpml)
 +
</pre>
  
An alternative to (Gauss–)Newton methods are quasi-Newton methods, which build up an approximation of the Hessian from previous gradients only and require no additional PDE solves or matrix inversions. Implementing an efficient and correct version of this method, such as the L-BFGS algorithm, exceeds a few lines of code, and we therefore leave this exercise to the reader. Instead of implementing more complicated algorithms by hand, it is also possible to interface third-party Julia optimization libraries; an example for this is given in the notebook fwi_example_NLopt.ipynb.
+
To extract the wavefield at a predetermined set of receiver locations, there is a corresponding wrapper function for receivers as well, which creates a <code>SparseFunction</code> object for a given number <code>npoint</code> of receivers, number <code>nt</code> of time samples, and specified receiver coordinates <code>rec_coords</code>:
  
Even though all examples shown here are two-dimensional, to make them reproducible on a laptop or desktop PC, JUDI can be used for 3D modeling and inversion without having to change the code, since the number of dimensions are automatically inferred from the velocity model and data dimensions.
+
<pre>
 +
rec = Receiver(name='rec', npoint=101, ntime=nt,
 +
              grid=model.grid, coordinates=rec_coords)
 +
</pre>
 +
 
 +
Rather than injecting a function into the model as we did for the source, we now simply save the wavefield at the grid points that correspond to receiver positions and interpolate the data to their exact possibly of the computational grid location:
 +
 
 +
<pre>
 +
rec_term = rec.interpolate(u, offset=model.nbpml)
 +
</pre>
 +
 
 +
== Forward simulation ==
 +
We can now define our forward propagator by adding the source and receiver terms to our stencil object:
 +
 
 +
<pre>
 +
op_fwd = Operator([stencil] + src_term + rec_term)
 +
</pre>
 +
 
 +
The symbolic expressions used to create <code>Operator</code> contain sufficient meta-information for Devito to create a fully functional computational kernel. The dimension symbols contained in the symbolic function object <code>(time, x, y)</code> define the loop structure of the created code,while allowing Devito to automatically optimize the underlying loop structure to increase execution speed.
 +
 
 +
The size of the loops and spacing between grid points is inferred from the symbolic <code>Function</code> objects and associated <code>model</code>. <code>grid</code> object at run-time. As a result, we can invoke the generated kernel through a simple Python function call by supplying the number of time steps <code>time</code> and the time-step size <code>dt</code>. The user data associated with each <code>Function</code> is updated in-place during operator execution, allowing us to extract the final wavefield and shot record directly from the symbolic function objects without unwanted memory duplication:
 +
 
 +
<pre>
 +
op_fwd(time=nt, dt=model.critical_dt)
 +
</pre>
 +
 
 +
When this has finished running, the resulting wavefield is stored in <code>u.data</code> and the shot record is in <code>rec.data</code>. We can easily plot this 2D array as an image, as shown in Figure 2.
 +
 
 +
[[File:Tle36121033.1 fig2.gif|thumb|500px|center|Figure 2. The shot record generated by Devito for the example velocity model.
 +
As demonstrated in the notebook, a movie of snapshots of the forward wavefield can also be generated by capturing the wavefield at discrete time steps. Figure 3 shows three timesteps from the movie.]]
 +
 
 +
[[File:Tle36121033.1 fig3.gif|thumb|500px|center|Figure 3. Three time steps from the wavefield simulation that resulted in the shot record in Figure 2. You can generate an animated version in the Notebook at github.com/seg.]]
  
 
== Conclusions ==
 
== Conclusions ==
In this final part of our FWI tutorial series, we demonstrated how to set up basic optimization algorithms for waveform inversion using JUDI. The methods shown here are all gradient based and differ in the way update directions for the velocity model are computed. Our numerical examples can serve for the reader as a basis for developing more advanced FWI workflows, which usually include additional data preprocessing, frequency continuation techniques, or further model constraints.
+
In this first part of the tutorial, we have demonstrated how to set up the discretized forward acoustic wave equations and associated wave propagator with run-time code generation. While we limited our discussion to the constant density acoustic wave equation, Devito is capable of handling more general wave equations, but this is a topic beyond this tutorial on simulating waves for inversion. In [[Full-waveform inversion, Part 2: Adjoint modeling|Part 2]] of our tutorial, we will show how to calculate a valid gradient of the FWI objective using the adjoint-state method. In Part 3, we will demonstrate how to set up a complete matrix-free and scalable optimization framework for acoustic FWI.
  
 
== Acknowledgments ==
 
== Acknowledgments ==
Line 83: Line 145:
  
 
==Related links==
 
==Related links==
* [[Full-waveform inversion 1: forward modeling]]
+
* [[Full-waveform inversion, Part 2: Adjoint modeling]]
*[[Full-waveform inversion, Part 2: Adjoint modeling]]
 
  
 
== References ==
 
== References ==

Revision as of 15:41, 5 February 2018

Since its re-introduction by Pratt (1999)[1], full-waveform inversion (FWI) has gained a lot of attention in geophysical exploration because of its ability to build high resolution velocity models more or less automatically in areas of complex geology. While there is an extensive and growing literature on the topic, publications focus mostly on technical aspects, making this topic inaccessible for a broader audience due to the lack of simple introductory resources for newcomers to geophysics. We will accomplish this by providing a hands-on walkthrough of FWI using Devito, a system based on domain-specific languages that automatically generates code for time-domain finite-differences.[2]

As usual, this tutorial is accompanied by all the code you need to reproduce the figures. Go to http://github.com/seg/tutorials-2017 and follow the links. In the Notebook, we describe how to simulate synthetic data for a specified source and receiver setup and how to save the corresponding wavefields and shot records. In Part 2 of this series, we will address how to calculate model updates, i.e. gradients of the FWI objective function, via adjoint modeling. Finally, in Part 3 we will demonstrate how to use this gradient as part of an optimization framework for inverting an unknown velocity model.

Introduction

Devito provides a concise and straightforward computational framework for discretizing wave equations, which underlie all FWI frameworks. We will show that it generates verifiable executable code at run time for wave propagators associated with forward and (in Part 2) adjoint wave equations. Devito frees the user from the recurrent and time-consuming development of performant time-stepping codes and allows the user to concentrate on the geophysics of the problem rather than on low-level implementation details of wave-equation simulators. This tutorial covers the conventional adjoint-state formulation of full-waveform tomography[3] that underlies most of the current methods referred to as full-waveform inversion.[4] While other formulations have been developed to improve the convergence of FWI for poor starting models, in these tutorials we will concentrate on the standard formulation that relies on the combination of a forward/adjoint pair of propagators and a correlation-based gradient. In part one of this tutorial, we discuss how to set up wave simulations for inversion, including how to express the wave equation in Devito symbolically and how to deal with the acquisition geometry.

What is FWI?

FWI tries to iteratively minimize the difference between data that was acquired in a seismic survey and synthetic data that is generated from a wave simulator with an estimated (velocity) model of the subsurface. As such, each FWI framework essentially consists of a wave simulator for forward modeling the predicted data and an adjoint simulator for calculating a model update from the data misfit. This first part of this tutorial is dedicated to the forward modeling part and demonstrates how to discretize and implement the acoustic wave equation using Devito.

Wave simulations for inversion

The acoustic wave equation with the squared slowness , defined as with being the unknown spatially varying wavespeed, is given by:


(1)

where is the Laplace operator, is the seismic source, located at and is a space-dependent dampening parameter for the absorbing boundary layer.[5] As shown in Figure 1, the physical model is extended in every direction by nbpml grid points to mimic an infinite domain. The dampening term attenuates the waves in the dampening layer and prevents waves from reflecting at the model boundaries. In Devito, the discrete representations of and are contained in a model object that contains a grid object with all relevant information such as the origin of the coordinate system, grid spacing, size of the model and dimensions time, x, y:

model = Model(vp=v,              # A velocity model.
              origin=(0, 0),     # Top left corner.
              shape=(101, 101),  # Number of grid points.
              spacing=(10, 10),  # Grid spacing in m.
              nbpml=40)          # boundary layer.
Figure 1: (a) Diagram showing the model domain, with the perfectly matched layer (PML) as an absorbing layer to attenuate the wavefield at the model boundary. (b) The example model used in this tutorial, with the source and receivers indicated. The grid lines show the cell boundaries.

In the Model instantiation, vp is the velocity in km/s, origin is the origin of the physical model in meters, spacing is the discrete grid spacing in meters, shape is the number of grid points in each dimension and nbpml is the number of grid points in the absorbing boundary layer. Is is important to note that shape is the size of the physical domain only, while the total number of grid points, including the absorbing boundary layer, will be automatically derived from shape and nbpml.

Symbolic definition of the wave propagator

To model seismic data by solving the acoustic wave equation, the first necessary step is to discretize this partial differential equation (PDE), which includes discrete representations of the velocity model and wavefields, as well as approximations of the spatial and temporal derivatives using finite-differences (FD). Unfortunately, implementing these finite-difference schemes in low-level code by hand is error prone, especially when we want performant and reliable code. The primary design objective of Devito is to allow users to define complex matrix-free finite-difference approximations from high-level symbolic definitions, while employing automated code generation to create highly optimized low-level C code. Using the symbolic algebra package SymPy to facilitate the automatic creation of derivative expressions, Devito generates computationally efficient wave propagators.[6]

At the core of Devito's symbolic API are symbolic types that behave like SymPy function objects, while also managing data:

  • Function objects represent a spatially varying function discretized on a regular Cartesian grid. For example, a function symbol f = Function(name='f', grid=model.grid, space_order=2) is denoted symbolically as f(x, y). The objects provide auto-generated symbolic expressions for finite-difference derivatives through shorthand expressions like f.dx and f.dx2 for the first and second derivative in x.
  • TimeFunction objects represent a time-dependent function that has time as the leading dimension, for example g(time, x, y). In addition to spatial derivatives TimeFunction symbols also provide time derivatives g.dt and g.dt2.
  • SparseFunction objects represent sparse components, such as sources and receivers, which are usually distributed sparsely and often located off the computational grid — these objects also therefore handle interpolation onto the model grid.

To demonstrate Devito's symbolic capabilities, let us consider a time-dependent function u(time,x,y) representing the discrete forward wavefield:

u = TimeFunction(name="u", grid=model.grid, 
                 time_order=2, space_order=2,
                 save=True, time_dim=nt)

where the grid object provided by the model defines the size of the allocated memory region, time_order and space_order define the default discretization order of the derived derivative expressions.

We can now use this symbolic representation of our wavefield to generate simple discretized expressions for finite-difference derivative approximations using shorthand expressions, such as u.dt and u.dt2 to denote du/dt and (d^2 u)/(dt^2 ) respectively:

>>> u.dt
-u(time - dt, x, y)/(2*dt) + u(time + dt, x, y)/(2*dt)
>>> u.dt2
-2*u(time, x, y)/dt**2 + u(time - dt, x, y)/dt**2 + u(time + dt, x, y)/dt**2

Using the automatic derivation of derivative expressions, we can now implement a discretized expression for Equation 1 without the source term q(x,y,t;x_s,y_s). The model object, which we created earlier, already contains the squared discrete slowness model.m and damping term model.damp as Function objects:

pde = model.m * u.dt2 - u.laplace + model.damp * u.dt

If we write out the (second order) second time derivative u.dt2 as shown earlier and ignore the damping term for the moment, our pde expression translates to the following discrete the wave equation:


(2)

with time being the current time step and dt being the time stepping interval. To propagate the wavefield, we rearrange to obtain an expression for the wavefield u(time+dt) at the next time step. Ignoring the damping term once again, this yields:


(3)

We can rearrange our pde expression automatically using the SymPy utility function solve, then create an expression which defines the update of the wavefield for the new time step u(time+dt), with the command u.forward:

stencil = Eq(u.forward, solve(pde, u.forward)[0])

stencil represents the finite-difference approximation derived from Equation 3, including the finite-difference approximation of the Laplacian and the damping term.

Although it defines the update for a single time step only, Devito knows that we will be solving a time-dependent problem over a number of time steps because the wavefield u is a TimeFunction object.

Setting up the acquisition geometry

The expression for time stepping we derived in the previous section does not contain a seismic source function yet, so the update for the wavefield at a new time step is solely defined by the two previous wavefields. However as indicated in Equation 1, wavefields for seismic experiments are often excited by an active (impulsive) source q(x,y,t;x_s), which is a function of space and time (just like the wavefield u). To include such a source term in our modeling scheme, we simply add the the source wavefield as an additional term to Equation 3:


(4)

Because the source appears on the right-hand side in the original equation (Equation 1), the term also needs to be multiplied with dt^2/m (this follows from rearranging Equation 2, with the source on the right-hand side in place of 0). Unlike the discrete wavefield u however, the source q is typically localized in space and only a function of time, which means the time-dependent source wavelet is injected into the propagating wavefield at a specified source location. The same applies when we sample the wavefield at receiver locations to simulate a shot record, i.e. the simulated wavefield needs to be sampled at specified receiver locations only. Source and receiver both do not necessarily coincide with the modeling grid.

Here, RickerSource acts as a wrapper around SparseFunction and models a Ricker wavelet with a peak frequency f0 and source coordinates src_coords:

f0 = 0.010  # kHz, peak frequency.
src = RickerSource(name='src', grid=model.grid, f0=f0,
                   time=time, coordinates=src_coords)

The src.inject function now injects the current time sample of the Ricker wavelet (weighted with dt^2/m as shown in Equation 4) into the updated wavefield u.forward at the specified coordinates.

src_term = src.inject(field=u.forward,
                      expr=src * dt**2 / model.m,
                      offset=model.nbpml)

To extract the wavefield at a predetermined set of receiver locations, there is a corresponding wrapper function for receivers as well, which creates a SparseFunction object for a given number npoint of receivers, number nt of time samples, and specified receiver coordinates rec_coords:

rec = Receiver(name='rec', npoint=101, ntime=nt,
               grid=model.grid, coordinates=rec_coords)

Rather than injecting a function into the model as we did for the source, we now simply save the wavefield at the grid points that correspond to receiver positions and interpolate the data to their exact possibly of the computational grid location:

rec_term = rec.interpolate(u, offset=model.nbpml)

Forward simulation

We can now define our forward propagator by adding the source and receiver terms to our stencil object:

op_fwd = Operator([stencil] + src_term + rec_term)

The symbolic expressions used to create Operator contain sufficient meta-information for Devito to create a fully functional computational kernel. The dimension symbols contained in the symbolic function object (time, x, y) define the loop structure of the created code,while allowing Devito to automatically optimize the underlying loop structure to increase execution speed.

The size of the loops and spacing between grid points is inferred from the symbolic Function objects and associated model. grid object at run-time. As a result, we can invoke the generated kernel through a simple Python function call by supplying the number of time steps time and the time-step size dt. The user data associated with each Function is updated in-place during operator execution, allowing us to extract the final wavefield and shot record directly from the symbolic function objects without unwanted memory duplication:

op_fwd(time=nt, dt=model.critical_dt)

When this has finished running, the resulting wavefield is stored in u.data and the shot record is in rec.data. We can easily plot this 2D array as an image, as shown in Figure 2.

Figure 2. The shot record generated by Devito for the example velocity model. As demonstrated in the notebook, a movie of snapshots of the forward wavefield can also be generated by capturing the wavefield at discrete time steps. Figure 3 shows three timesteps from the movie.
Figure 3. Three time steps from the wavefield simulation that resulted in the shot record in Figure 2. You can generate an animated version in the Notebook at github.com/seg.

Conclusions

In this first part of the tutorial, we have demonstrated how to set up the discretized forward acoustic wave equations and associated wave propagator with run-time code generation. While we limited our discussion to the constant density acoustic wave equation, Devito is capable of handling more general wave equations, but this is a topic beyond this tutorial on simulating waves for inversion. In Part 2 of our tutorial, we will show how to calculate a valid gradient of the FWI objective using the adjoint-state method. In Part 3, we will demonstrate how to set up a complete matrix-free and scalable optimization framework for acoustic FWI.

Acknowledgments

This research was carried out as part of the SINBAD II project with the support of the member organizations of the SINBAD Consortium. This work was financially supported in part by EPSRC grant EP/L000407/1 and the Imperial College London Intel Parallel Computing Centre.

Related links

References

  1. Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model: GEOPHYSICS, 64, 888–901. http://dx.doi.org/10.1190/1.1444597
  2. Lange, M., Kukreja, N., Louboutin, M., Luporini, F., Zacarias, F. V., Pandolfo, V., Gorman, G., 2016, Devito: Towards a generic finite difference DSL using symbolic python: 6th workshop on python for high-performance and scientific computing. http://dx.doi.org/10.1109/PyHPC.2016.9
  3. Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: GEOPHYSICS, 49, 1259–1266. http://dx.doi.org/10.1190/1.1441754
  4. Virieux, J., and Operto, S., 2009, An overview of full-waveform inversion in exploration geophysics: GEOPHYSICS, 74, WCC1–WCC26. http://dx.doi.org/10.1190/1.3238367
  5. Cerjan, C., Kosloff, D., Kosloff, R., and Reshef, M., 1985, A nonreflecting boundary condition for discrete acoustic and elastic wave equations: GEOPHYSICS, 50, 705–708. http://dx.doi.org/10.1190/1.1441945
  6. Meurer A, Smith CP, Paprocki M, et al., 2017, SymPy: symbolic computing in Python. PeerJ Computer Science 3:e103 http://dx.doi.org/10.7717/peerj-cs.103

Corresponding authors

  • Corresponding author: Mathias Louboutin, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, mloubout‐at‐eoas.ubc.ca
  • Philipp Witte, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia
  • Michael Lange, Imperial College London, London, UK
  • Navjot Kukreja, Imperial College London, London, UK
  • Fabio Luporini, Imperial College London, London, UK
  • Gerard Gorman, Imperial College London, London, UK
  • Felix J. Herrmann, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, now at Georgia Institute of Technology, USA

External links

find literature about
Full-waveform inversion, Part 1: Forward modeling
SEG button search.png Datapages button.png GeoScienceWorld button.png OnePetro button.png Schlumberger button.png Google button.png AGI button.png