Difference between revisions of "Full-waveform inversion, Part 1: Forward modeling"

From SEG Wiki
Jump to: navigation, search
m (removed $)
(added link to part 3)
 
(44 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Exploring nonlinear inversions: A 1D magnetotelluric example
+
{{Tutorial|TLE|December 2017}}
{{Tutorial|TLE|August 2017}}
+
Since its re-introduction by Pratt (1999)<ref>Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model: GEOPHYSICS, 64, 888–901. {{doi|10.1190/1.1444597}}</ref>, [[full-waveform inversion|full-waveform inversion (FWI)]] has gained a lot of attention in geophysical exploration because of its ability to build high resolution velocity models more or less automatically in areas of complex [[geology]]. While there is an extensive and growing literature on the topic, publications focus mostly on technical aspects, making this topic inaccessible for a broader audience due to the lack of simple introductory resources for newcomers to [[geophysics]]. We will accomplish this by providing a hands-on walkthrough of FWI using Devito, a system based on domain-specific languages that automatically generates code for time-domain finite-differences.<ref>Lange, M., Kukreja, N., Louboutin, M., Luporini, F., Zacarias, F. V., Pandolfo, V., Gorman, G., 2016, Devito: Towards a generic finite difference DSL using symbolic python: 6th workshop on python for high-performance and scientific computing. {{doi|10.1109/PyHPC.2016.9}}</ref>
At some point in many geophysical workflows, an inversion is a necessary step for answering the geoscientific question at hand, whether it is recovering a reflectivity series from a seismic trace in a deconvolution problem, finding a susceptibility model from magnetic data, or recovering conductivity from an electromagnetic survey. This is particularly true when working with data sets where it may not even be clear how to plot the data: 3D direct current resistivity and induced polarization surveys (it is not necessarily clear how to organize data into a pseudosection) or multicomponent data, such as electromagnetic data (we can measure three spatial components of electric and/or magnetic fields through time over a range of frequencies). Inversion is a tool for translating these data into a model we can interpret. The goal of the inversion is to find a “model” — some description of the earth's physical properties — that is consistent with both the data and geologic knowledge.
 
  
In a general inverse problem, we start from a forward problem, of the form <math>\mathcal{F}[\mathbf{m}] = \mathbf{d}</math>, where <math>\mathcal{F}</math> is the forward operator (the mathematical description of the physics/problem), '''d''' is our data, and '''m''' is our earth model (an array of numbers that describes the physical properties of the earth). Matt Hall kicked off the discussion of inversions in ''The Leading Edge'' in his Linear Inversion tutorial.<ref>Hall, M., 2016, Linear inversion: The Leading Edge, '''35''', no. 12, 1085–1087, {{doi|10.1190/tle35121085.1}}</ref> He walked through how to solve the classic linear inverse problem in which the forward simulation takes the form <math>\mathcal{F}[\mathbf{m}] = \mathbf{G}\mathbf{m} = \mathbf{d}</math>. The example he demonstrated is a deconvolution problem; in that case, '''G''' is a convolution matrix, '''m''' is the reflectivity series, and '''d''' is a seismic trace. He introduced the concepts of an underdetermined problem, motivated the need for regularization, formulated the inversion in terms of an optimization problem, and solved the linear inverse problem (in true polyglot fashion, using Python, Lua, Julia, and R). In this tutorial, we will pick up from there and explore a <math>\emph{nonlinear}</math> forward problem, of the form <math>\mathcal{F}[\mathbf{m}] = \mathbf{d}</math>; in this case, our forward operator is a function of the model. In the accompanying notebooks (https://github.com/seg), we use SimPEG (http://simpeg.xyz) for the implementation in Python of the physics simulations, optimization, and structure necessary to perform an inversion.<ref>Cockett, R., S. Kang, L. J. Heagy, A. Pidlisecky, and D. W. Oldenburg, 2015, SimPEG: An open source framework for simulation and gradient based parameter estimation in geophysical applications: Computers & Geosciences, '''85''', 142–154, {{doi|10.1016/j.cageo.2015.09.015}}</ref>
+
As usual, this tutorial is accompanied by all the code you need to reproduce the figures. Go to http://github.com/seg/tutorials-2017 and follow the links. In the Notebook, we describe how to simulate synthetic data for a specified source and receiver setup and how to save the corresponding wavefields and shot records. In [[Full-waveform inversion, Part 2: Adjoint modeling|Part 2]] of this series, we will address how to calculate model updates, i.e. gradients of the FWI objective function, via adjoint modeling. Finally, in [[Full-waveform inversion, Part 3: Optimization|Part 3]] we will demonstrate how to use this gradient as part of an optimization framework for inverting an unknown [[velocity model]].
  
== Magnetotellurics ==
+
== Introduction ==
We will explore the 1D magnetotelluric (MT) survey technique, which is a natural source electromagnetic method. In MT, plane-wave source fields are generated by solar wind (giving us low-frequency signals <1 Hz) and lightning strikes worldwide (giving us higher frequency signals >1 Hz). In MT, the model '''m''' is a description of the earth's electrical conductivity <math>\sigma</math>, and <math>\mathcal{F}[\mathbf{m}]</math> solves Maxwell's equations — giving the electric field <math>\vec{E}</math> and the magnetic field <math>\vec{H}</math> — for a plane-wave source. In the continuous world, Maxwell's equations are:
+
Devito provides a concise and straightforward computational framework for discretizing wave equations, which underlie all FWI frameworks. We will show that it generates verifiable executable code at run time for wave propagators associated with forward and (in [[Full-waveform inversion, Part 2: Adjoint modeling|Part 2]]) adjoint wave equations. Devito frees the user from the recurrent and time-consuming development of performant time-stepping codes and allows the user to concentrate on the geophysics of the problem rather than on low-level implementation details of wave-equation simulators. This tutorial covers the conventional adjoint-state formulation of full-waveform tomography<ref>Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: GEOPHYSICS, 49, 1259–1266. {{doi|10.1190/1.1441754}}</ref> that underlies most of the current methods referred to as full-waveform inversion.<ref>Virieux, J., and Operto, S., 2009, An overview of full-waveform inversion in exploration geophysics: GEOPHYSICS, 74, WCC1–WCC26. {{doi|10.1190/1.3238367}}</ref> While other formulations have been developed to improve the convergence of FWI for poor starting models, in these tutorials we will concentrate on the standard formulation that relies on the combination of a forward/adjoint pair of propagators and a correlation-based gradient. In part one of this tutorial, we discuss how to set up wave simulations for inversion, including how to express the wave equation in Devito symbolically and how to deal with the acquisition geometry.
  
{{NumBlk|:|<math>\nabla \times \vec{E} + i\omega\mu\vec{H}= 0</math>
+
== What is FWI? ==
 +
FWI tries to iteratively minimize the difference between data that was acquired in a seismic survey and synthetic data that is generated from a wave simulator with an estimated (velocity) model of the subsurface. As such, each FWI framework essentially consists of a wave simulator for forward modeling the predicted data and an adjoint simulator for calculating a model update from the data misfit. This first part of this tutorial is dedicated to the forward modeling part and demonstrates how to discretize and implement the acoustic wave equation using [http://www.opesci.org/devito/ Devito].
  
<math>\nabla \times \vec{H} + \sigma\vec{E}= 0</math>
+
== Wave simulations for inversion ==
 +
The acoustic wave equation with the squared slowness <math>m</math>, defined as <math>m(x,y)=c^{-2} (x,y)</math> with <math>c(x,y)</math> being the unknown spatially varying wavespeed, is given by:
  
<math>\text{with boundary conditions}</math>
+
{{NumBlk|:|<math>m\frac{d^{2}u(t,x,y)}{dt^{2}}-\Delta u(t,x,y)+\eta\frac{du(t,x,y)}{dt}=q(t,x,y;x_{s},y_{s})</math>|{{EquationRef|1}}}}
  
|{{EquationRef|1}}}}
+
where <math>\Delta</math> is the Laplace operator, <math>q(t,x,y;x_s,y_s)</math> is the seismic source, located at <math>(x_s,y_s)</math> and <math>\eta(x,y)</math> is a space-dependent dampening parameter for the absorbing boundary layer.<ref>Cerjan, C., Kosloff, D., Kosloff, R., and Reshef, M., 1985, A nonreflecting boundary condition for discrete acoustic and elastic wave equations: GEOPHYSICS, 50, 705–708. {{doi|10.1190/1.1441945}}</ref> As shown in Figure 1, the physical model is extended in every direction by <code>nbpml</code> grid points to mimic an infinite domain. The dampening term <math>\eta du/dt</math> attenuates the waves in the dampening layer and prevents waves from reflecting at the model boundaries. In Devito, the discrete representations of <math>m</math> and <math>\eta</math> are contained in a <code>model</code> object that contains a grid object with all relevant information such as the origin of the coordinate system, grid spacing, size of the model and dimensions <code>time, x, y</code>:
where <math>\mu</math> is magnetic permeability, generally taken to be the permeability of free space <math>\mu_0</math>, and <math>\omega = 2 \pi f</math> is the angular frequency (<math>f</math> is frequency in Hz). When we discretize Maxwell's equations so they can be solved numerically (see the first notebook), we obtain a matrix system of the form
 
  
{{NumBlk|:|<math>\mathbf{A}(\mathbf{m})\mathbf{u} = \mathbf{b}</math>|{{EquationRef|2}}}}
+
<pre>
 +
model = Model(vp=v,              # A velocity model.
 +
              origin=(0, 0),    # Top left corner.
 +
              shape=(101, 101),  # Number of grid points.
 +
              spacing=(10, 10),  # Grid spacing in m.
 +
              nbpml=40)         # boundary layer.
 +
</pre>
  
where '''A(m)''' is a matrix capturing the physics, '''u''' is a vector of the electric and magnetic fields everywhere in our simulation domain, and '''b''' includes the boundary conditions that describe the plane wave source. This problem is nonlinear because this matrix '''A''' depends on the conductivity model. When we solve equation 2 for '''u''', we obtain the electric and magnetic fields everywhere on our mesh. Our data consist of samples of the electric and magnetic fields where we have receivers. For the MT problem, the measured data, '''d''', are impedances; for the 1D problem, the impedance Zxy at a single frequency is given by
+
[[File:Tle36121033.1 fig1.gif|thumb|center|500px|Figure 1: (a) Diagram showing the model domain, with the perfectly matched layer (PML) as an absorbing layer to attenuate the wavefield at the model boundary. (b) The example model used in this tutorial, with the source and receivers indicated. The grid lines show the cell boundaries.]]
  
{{NumBlk|:|<math>Z_{xy} = -\frac{E_x}{H_y}.</math>|{{EquationRef|3}}}}
+
In the <code>Model</code> instantiation, <code>vp</code> is the velocity in km/s, <code>origin</code> is the origin of the physical model in meters, <code>spacing</code> is the discrete grid spacing in meters, <code>shape</code> is the number of grid points in each dimension and nbpml is the number of grid points in the absorbing boundary layer. Is is important to note that shape is the size of the physical domain only, while the total number of grid points, including the absorbing boundary layer, will be automatically derived from <code>shape</code> and <code>nbpml</code>.
  
Note that the impedance is a complex number, consisting of real and imaginary parts. Impedance is a nonintuitive quantity; often, we instead consider apparent resistivity2 ρa and phase ψ, given by
+
== Symbolic definition of the wave propagator ==
 +
To model seismic data by solving the acoustic wave equation, the first necessary step is to discretize this partial differential equation (PDE), which includes discrete representations of the velocity model and wavefields, as well as approximations of the spatial and temporal derivatives using finite-differences (FD). Unfortunately, implementing these finite-difference schemes in low-level code by hand is error prone, especially when we want performant and reliable code.
 +
The primary design objective of Devito is to allow users to define complex matrix-free finite-difference approximations from high-level symbolic definitions, while employing automated code generation to create highly optimized low-level C code. Using the symbolic algebra package SymPy to facilitate the automatic creation of derivative expressions, Devito generates computationally efficient wave propagators.<ref>Meurer A, Smith CP, Paprocki M, et al., 2017, SymPy: symbolic computing in Python. PeerJ Computer Science 3:e103 {{doi|10.7717/peerj-cs.103}}</ref>
  
{{NumBlk|:|<math>\rho_\mathrm{a} = \frac{1}{\mu_0\omega} \big|Z_{xy}\big|^2,  
+
At the core of Devito's symbolic API are symbolic types that behave like SymPy function objects, while also managing data:
\quad
+
* <code>Function</code> objects represent a spatially varying function discretized on a regular Cartesian grid. For example, a function symbol <code>f = Function(name='f', grid=model.grid, space_order=2)</code> is denoted symbolically as <code>f(x, y)</code>. The objects provide auto-generated symbolic expressions for finite-difference derivatives through shorthand expressions like <code>f.dx</code> and <code>f.dx2</code> for the first and second derivative in <code>x</code>.
\psi = \tan^{-1}\left(\frac{\text{Im}(Z_{xy})}{\text{Re}(Z_{xy})}\right).
+
* <code>TimeFunction</code> objects represent a time-dependent function that has time as the leading dimension, for example <code>g(time, x, y)</code>. In addition to spatial derivatives <code>TimeFunction</code> symbols also provide time derivatives <code>g.dt</code> and <code>g.dt2</code>.
</math>|{{EquationRef|4}}}}
+
* <code>SparseFunction</code> objects represent sparse components, such as sources and receivers, which are usually distributed sparsely and often located off the computational grid — these objects also therefore handle interpolation onto the model grid.
  
For an earth that is a half-space, the apparent resistivity equals the true resistivity, and the phase is 45°. When we implement the computation of the data, we define a method that: (1) selects the values of the electric and magnetic fields at the surface of the earth from the fields that we computed everywhere in our domain '''u''' and (2) computes their ratio to provide us with impedance data, that is: '''P(u) = d'''.
+
To demonstrate Devito's symbolic capabilities, let us consider a time-dependent function '''u'''(time,''x,y'') representing the discrete forward wavefield:
  
In summary, to implement the forward simulation for the MT problem (ℱ['''m'''] '''= d'''), we break it into two steps:
+
<pre>
 +
u = TimeFunction(name="u", grid=model.grid,
 +
                time_order=2, space_order=2,
 +
                save=True, time_dim=nt)
 +
</pre>
  
# solve '''A(m)u = b'''; and
+
where the grid object provided by the model defines the size of the allocated memory region, <code>time_order</code> and <code>space_order</code> define the default discretization order of the derived derivative expressions.
# compute the impedance data '''d = P(u)'''.
 
  
In the first notebook, we provide details on how each step is performed using a finite difference approach. If you are looking for more on numerical discretization, we wrote a tutorial on finite volume methods.<ref>Cockett, R., L. J. Heagy, and D. W. Oldenburg, 2016, Pixels and their neighbors: Finite volume: The Leading Edge, 35, no. 8, 703–706, {{doi|10.1190/tle35080703.1.}}</ref>
+
We can now use this symbolic representation of our wavefield to generate simple discretized expressions for finite-difference derivative approximations using shorthand expressions, such as <code>u.dt</code> and <code>u.dt2</code> to denote ''du/dt'' and ''(d^2 u)/(dt^2 )'' respectively:
  
The inversion aims to solve ℱ−1[d] for a model. Just as in the linear problem, we require regularization to select a model from the infinitely many that can fit the data. Before we tackle this ill-posed inverse problem, let's explore an example of nonuniqueness: how can different models give us the same data?
+
<pre>
 +
>>> u.dt
 +
-u(time - dt, x, y)/(2*dt) + u(time + dt, x, y)/(2*dt)
 +
>>> u.dt2
 +
-2*u(time, x, y)/dt**2 + u(time - dt, x, y)/dt**2 + u(time + dt, x, y)/dt**2
 +
</pre>
  
== Go forwards ==
+
Using the automatic derivation of derivative expressions, we can now implement a discretized expression for Equation 1 without the source term ''q(x,y,t;x_s,y_s)''. The
A classic example that demonstrates the nonuniqueness of MT data is the equivalence of the conductivity-thickness product (conductance) of a thin layer. If we start with a layer that has a conductivity of σ, halve its thickness, and double its conductivity, the resulting data will be similar. In Figure 1, we show apparent resistivity and phase data for five models, each of which has the same conductance. In all of the simulations, the data show a decrease in apparent resistivity and an increase in phase starting at ∼10 Hz. Thus, in all of the data we have evidence of a conductive layer, and the frequency range at which it appears is an indicator of the <code>depth</code> of the layer (you can explore by changing the depth variable in the model setup of the second notebook). However, all scenarios produce similar data. Even with a small amount of noise, we cannot expect an inversion code to separate the conductivity and thickness of a conductive unit without incorporating additional information. When setting up the inverse problem and defining regularization (next up), it is important to realize that the choices we make there will influence the character of the model we recover, as the data alone do not provide us with a unique model.
+
model object, which we created earlier, already contains the squared discrete slowness <code>model.m</code> and damping term <code>model.damp</code> as <code>Function</code> objects:
  
[[File:Tle36080696.1 fig1.gif|thumb|450px|Figure 1. MT responses from five models, each having an equivalent conductivity-thickness product for the conductive layer. (a) Conductivity models, (b) apparent resistivity (ρa), and (c) phase (ϕ).]]
+
<pre>
 +
pde = model.m * u.dt2 - u.laplace + model.damp * u.dt
 +
</pre>
  
== Go backwards ==
+
If we write out the (second order) second time derivative <code>u.dt2</code> as shown earlier and ignore the damping term for the moment, our <code>pde</code> expression translates to the following discrete the wave equation:
There are many models that can fit the data, so we need a means of ''regularizing'' our inversion so that we can select a single, reasonable model from the (infinitely) many that agree with the data. We will come back to how the regularization ϕm is defined, but for now, consider it as a measure of how “reasonable” the model is based on our prior knowledge about the earth. Formally, we pose the inversion as an optimization problem:
 
  
{{NumBlk|:|<math>\underset{\mathbf{m}}{\text{minimize}} \quad \phi_\text{m}(\mathbf{m})</math>
+
{{NumBlk|:|<math>\frac{\mathbf{m}}{dt^{2}}(\mathbf{u}[time-dt]-2\mathbf{u}[time]+\mathbf{u}[time+dt])-\Delta \mathbf{u}[time]=0, time=1\cdots n_{t-1}</math>|{{EquationRef|2}}}}
  
<math>\text{subject to }\quad \mathcal{F}[\mathbf{m}] = \mathbf{d}_{\text{obs}}</math>
+
with time being the current time step and dt being the time stepping interval. To propagate the wavefield, we rearrange to obtain an expression for the wavefield '''u'''(time+dt) at the next time step. Ignoring the damping term once again, this yields:
|{{EquationRef|5}}}}
 
Essentially we are saying “find the model '''m''' that best fits the assumptions we are making in the regularization ϕm('''m''') and that agrees with the observed data dobs.” In practice, our data are noisy, so there is no sense in fitting them exactly. Rather, we pose the optimization problem as a trade-off between fitting the data and fitting the regularization, so our inverse problem can be stated as:
 
  
{{NumBlk|:|<math>\underset{\mathbf{m}}{\text{minimize}} \quad \phi(\mathbf{m}) = \phi_\mathrm{d}(\mathbf{m}) + \beta\phi_\mathrm{m}(\mathbf{m})</math>|{{EquationRef|6}}}}
+
{{NumBlk|:|<math>\mathbf{u}[time+dt]=2\mathbf{u}[time]-\mathbf{u}[time-dt])+\frac{dt^{2}}{\mathbf{m}}\Delta \mathbf{u}[time]</math>|{{EquationRef|3}}}}
  
where ϕd(m) is the data misfit, a measure of how far our simulated data are from the observed data; ϕm(m) is the regularization; and β is a trade-off parameter that weights the relative importance of the data misfit and regularization in the optimization. A larger β says that we want our model to do a good job minimizing the regularization, while a smaller β turns down the importance of the regularization and says that fitting the data is more important in the inversion.
+
We can rearrange our <code>pde</code> expression automatically using the SymPy utility function <code>solve</code>, then create an expression which defines the update of the wavefield for the new time step '''u'''(time+dt), with the command <code>u.forward</code>:
The data misfit is often taken to be a weighted l2 norm:
 
  
{{NumBlk|:|<math>\phi_d(\mathbf{m}) = \frac{1}{2}\|\mathbf{W_d} (\mathcal{F}(\mathbf{m}) - \mathbf{d}_{\text{obs}})\|^2_2</math>|{{EquationRef|7}}}}
+
<pre>
where Wd captures the noise model (typically it is a diagonal matrix containing the standard deviation of each datum).
+
stencil = Eq(u.forward, solve(pde, u.forward)[0])
The regularization is one place where a priori information about the geologic setting can be brought in. There are a variety of regularization functionals that can be chosen, but one of the most widely used is Tikhonov regularization, which again uses l2 norms:
+
</pre>
  
{{NumBlk|:|<math>\phi_m(\mathbf{m}) = \frac{1}{2}\big(\alpha_\mathrm{s}\|\mathbf{W_\mathrm{s}} (\mathbf{m} - \mathbf{m}_{\text{ref}})\|^2_2 + \alpha_\mathrm{z}\|\mathbf{W_\mathrm{z}} (\mathbf{m})\|^2_2 \big)</math>|{{EquationRef|8}}}}
+
<code>stencil</code> represents the finite-difference approximation derived from Equation 3, including the finite-difference approximation of the Laplacian and the damping term.  
The first term is often referred to as the “smallness” as it measures the “size” of the model (in the l2 sense). The matrix Ws is generally taken to be a diagonal matrix that may contain information about the length scales of the model or be used to weight the relative importance of various parameters in the model. The scalar αs weights the relative importance of this term in the regularization. Notice that we include a reference model, mref. Often this is defined as a constant value, but if more information is known about the background, that can be used to construct a more intricate reference model. Here, we will not delve too far into how the reference model impacts the recovered results, but you are encouraged to change <code>mref</code> in the notebooks and investigate its impact.
 
  
The second term is often referred to as the “smoothness.” The matrix Wz approximates the derivative of the model with respect to depth, and is hence a measure of how “smooth” the model is. The term αz weights the relative importance of smoothness in the regularization.
+
Although it defines the update for a single time step only, Devito knows that we will be solving a time-dependent problem over a number of time steps because the wavefield u is a <code>TimeFunction</code> object.
  
From this setup, we see that there are quite a number of choices to make: defining uncertainties on the data (Wd), selecting a reference model (mref), choosing the importance of smallness and smoothness (αs and αz), and selecting a trade-off parameter (β). Let's start by assuming a known noise model, fix αs and αz, and explore the impact of the trade-off parameter β. Our forward problem depends on the electrical conductivity. For the inverse problem, however, we are free to use any function of the conductivity as a parameter. The electrical conductivity of earth materials varies by many orders of magnitude and is strictly positive. Thus it is advantageous to use log(σ) as the model in the inverse problem. For a nonlinear problem, we also have the additional choice of the initial model m0 at which to start the inversion. Although we will not discuss the choice of m0, you are encouraged to change the initial model in the notebooks and examine the impact it makes because it can be significant.
+
== Setting up the acquisition geometry ==
 +
The expression for time stepping we derived in the previous section does not contain a seismic source function yet, so the update for the wavefield at a new time step is solely defined by the two previous wavefields. However as indicated in Equation 1, wavefields for seismic experiments are often excited by an active (impulsive) source ''q(x,y,t;x_s)'', which is a function of space and time (just like the wavefield '''u'''). To include such a source term in our modeling scheme, we simply add the the source wavefield as an additional term to Equation 3:
  
The β knob. If the noise is Gaussian, then the sum of squares (our data misfit) is a Chi-squared distribution, which has an expected value of Ndata (in our case, we divide this by two to match our definition of ϕd). Thus, the ideal choice of β is one that gives us φ∗d≈12Ndataφd∗≈12Ndata. To demonstrate the effect of β, we consider a five-layer model, originally shown in Whitall and Oldenburg (1992), and will demonstrate inversions when we achieve the target misfit, underfit the data, and overfit the data.<ref>Whittall, K., and D. Oldenburg, 1992, Inversion of magnetotelluric data for a one-dimensional conductivity: Society of Exploration Geophysicists. {{doi|10.1190/1.9781560802419}}</ref> The conductivity model used is the solid black line in Figure 2a. For these inversions we fix the regularization parameters to αs = 10−2, αz = 1 and set mref = log 10−2 S/m, and the initial model, m0 = mref (feel free to change them in the notebook). We start the inversion with a large β and decrease its value to plot the trade-off or Tikhonov curve (Figure 2b). In blue, we show the inversion that is stopped when the data misfit approximately equals the target misfit (the star in Figure 2b). Figures 2c and 2d show the data as apparent resistivity and phase, which is a visualization of our complex-valued impedance data. The blue line in Figure 2a shows the recovered model, which identifies the general structure and conductivity values of the five layers. In this case, we are employing a smooth regularization, thus we expect to recover smoothly varying structures.
+
{{NumBlk|:|<math>\mathbf{u}[time+dt]=2\mathbf{u}[time]-\mathbf{u}[time-dt])+\frac{dt^{2}}{\mathbf{m}}(\Delta \mathbf{u}[time]+\mathbf{q}[time])</math>|{{EquationRef|4}}}}
  
[[File:Tle36080696.1 fig2.gif|thumb|450px|Figure 2. Inversions that fit the data (blue), underfit the data (orange), and overfit the data (green). (a) True (black) and recovered electrical conductivity models. (b) Tikhonov curve showing the trade-off between the misfit and regularization, target misfit (red star), and achieved misfit corresponding to each of the inversion results shown (c) observed (black) and predicted apparent resistivity data, (d) observed (black) and predicted phase data.]]
+
Because the source appears on the right-hand side in the original equation (Equation 1), the term also needs to be multiplied with dt^2/'''m''' (this follows from rearranging Equation 2, with the source on the right-hand side in place of 0). Unlike the discrete wavefield '''u''' however, the source '''q''' is typically localized in space and only a function of time, which means the time-dependent source wavelet is injected into the propagating wavefield at a specified source location. The same applies when we sample the wavefield at receiver locations to simulate a shot record, i.e. the simulated wavefield needs to be sampled at specified receiver locations only. Source and receiver both do not necessarily coincide with the modeling grid.
  
If we instead choose a larger β, reducing the contribution of the data misfit to the objective function, we underfit the data, as is shown in orange in Figure 2. Although we still see evidence of two conductive structures, we do not recover their amplitudes and do a poor job resolving the location and widths of the conductive layers. (If you had to pick the top of the first layer, where should it be?) Examining the plots in Figures 2c and 2d, there is more insight about the subsurface conductivity that can be learned by pushing the inversion to extract more from the data.
+
Here, <code>RickerSource</code> acts as a wrapper around <code>SparseFunction</code> and models a Ricker wavelet with a peak frequency <code>f0</code> and source coordinates <code>src_coords</code>:
 +
<pre>
 +
f0 = 0.010  # kHz, peak frequency.
 +
src = RickerSource(name='src', grid=model.grid, f0=f0,
 +
                  time=time, coordinates=src_coords)
 +
</pre>
 +
The <code>src.inject</code> function now injects the current time sample of the Ricker wavelet (weighted with dt^2/'''m''' as shown in Equation 4) into the updated wavefield <code>u.forward</code> at the specified coordinates.
  
On the other extreme, we can choose a very small β and try to fit all of the details in the data. Doing this, we obtain the results shown in green in Figure 2. When we push the inversion to fit the (noisy) data very closely, we end up fitting the noise. To do this, conductivity contrasts are exaggerated and oscillatory and erroneous conductivity structures are introduced in the inversion.
+
<pre>
 +
src_term = src.inject(field=u.forward,
 +
                      expr=src * dt**2 / model.m,
 +
                      offset=model.nbpml)
 +
</pre>
  
The α knobs. For the inversions shown in Figure 2, we prescribed the values αs, αz. What impact do they have on the character of the model we recover?
+
To extract the wavefield at a predetermined set of receiver locations, there is a corresponding wrapper function for receivers as well, which creates a <code>SparseFunction</code> object for a given number <code>npoint</code> of receivers, number <code>nt</code> of time samples, and specified receiver coordinates <code>rec_coords</code>:
  
In Figure 3, we compare two inversions with different regularization parameters: (1) a “smooth” inversion (blue line) with αs = 10−5 and αz = 1 and (2) a “small” inversion (orange line) with αs = 1 and αz = 10−5. In both, β was chosen so that a desired target misfit was achieved. The smooth inversion penalizes large gradients; the resulting model has two smooth peaks. Note that we smooth over the resistive third layer, overestimating its conductivity. The small inversion instead favors models that are close to the reference model; this model has more structure. The resistivity of the first layer matches well, and the conductivity of the third layer is closer to its true value, but additional oscillatory structures are introduced at depth. In the third notebook, you can explore the impact of these parameters yourself.
+
<pre>
 +
rec = Receiver(name='rec', npoint=101, ntime=nt,
 +
              grid=model.grid, coordinates=rec_coords)
 +
</pre>
  
[[File:Tle36080696.1 fig3.gif|thumb|450px|Figure 3. Comparing the use of smooth regularization versus small regularization in the inversion.]]
+
Rather than injecting a function into the model as we did for the source, we now simply save the wavefield at the grid points that correspond to receiver positions and interpolate the data to their exact possibly of the computational grid location:
  
In practice, these parameters are often determined by experimentation; strategies such as examining length scales are often successfully adopted (see page 38 in Oldenburg and Li, 2005)<ref name=oldenburg>Oldenburg, D. W., and Y. Li, 2005, Inversion for applied geophysics: A tutorial, ''in'' D. K. Butler, ed., Investigations in Geophysics: Near-Surface Geophysics, 89–150, {{doi|10.1190/1.9781560801719.ch5}}</ref>. Changing the relative values of αs and αz is one way to bring in a priori information. If we know very little, often starting with a smooth inversion is a good option; this penalizes structure (high gradients) while showing general trends. If more structure is expected, or a reliable reference model can be built from additional data such as physical property measurements, well logs, or additional geophysical/geologic data, then the influence of the smallness term may be increased. There are a few other ways to bring in additional a priori information. If we are expecting a more “blocky” model, we can choose a different norm (such as an l1 norm), or if we have structural constraints, we can introduce other weighting structures (e.g., on the smoothness); these are knobs for another tutorial, and there is discussion in Oldenburg and Li (2005).<ref name=oldenburg />
+
<pre>
 +
rec_term = rec.interpolate(u, offset=model.nbpml)
 +
</pre>
  
== Summary ==
+
== Forward simulation ==
In this tutorial, we have introduced the forward simulation for MT and explored a few aspects of the inverse problem. Prior to jumping into an inversion, it is important to know the limitations of the survey and data, and what you can and cannot resolve, even if there is no noise. Forward modeling is a powerful tool for setting realistic expectations of an inversion.
+
We can now define our forward propagator by adding the source and receiver terms to our stencil object:
  
To set up and solve the inverse problem, we posed the inversion as an optimization problem that searches for a model of the earth that minimizes an objective function consisting of a data misfit and a regularization term. There are many choices to be made in defining the various elements of the inverse problem, including how to assign uncertainties, selecting a trade-off parameter, defining the regularization function, and choosing initial and reference models. In this tutorial we explored two of the knobs: (1) the trade-off parameter and (2) the relative importance of smallness and smoothness contributions in Tikhonov regularization. The interactive notebooks that are provided allow you to change parameters and experiment with their impact.
+
<pre>
 +
op_fwd = Operator([stencil] + src_term + rec_term)
 +
</pre>
 +
 
 +
The symbolic expressions used to create <code>Operator</code> contain sufficient meta-information for Devito to create a fully functional computational kernel. The dimension symbols contained in the symbolic function object <code>(time, x, y)</code> define the loop structure of the created code,while allowing Devito to automatically optimize the underlying loop structure to increase execution speed.
 +
 
 +
The size of the loops and spacing between grid points is inferred from the symbolic <code>Function</code> objects and associated <code>model</code>. <code>grid</code> object at run-time. As a result, we can invoke the generated kernel through a simple Python function call by supplying the number of time steps <code>time</code> and the time-step size <code>dt</code>. The user data associated with each <code>Function</code> is updated in-place during operator execution, allowing us to extract the final wavefield and shot record directly from the symbolic function objects without unwanted memory duplication:
 +
 
 +
<pre>
 +
op_fwd(time=nt, dt=model.critical_dt)
 +
</pre>
 +
 
 +
When this has finished running, the resulting wavefield is stored in <code>u.data</code> and the shot record is in <code>rec.data</code>. We can easily plot this 2D array as an image, as shown in Figure 2.
 +
 
 +
[[File:Tle36121033.1 fig2.gif|thumb|500px|center|Figure 2. The shot record generated by Devito for the example velocity model.
 +
As demonstrated in the notebook, a movie of snapshots of the forward wavefield can also be generated by capturing the wavefield at discrete time steps. Figure 3 shows three timesteps from the movie.]]
 +
 
 +
[[File:Tle36121033.1 fig3.gif|thumb|500px|center|Figure 3. Three time steps from the wavefield simulation that resulted in the shot record in Figure 2. You can generate an animated version in the Notebook at github.com/seg.]]
 +
 
 +
== Conclusions ==
 +
In this first part of the tutorial, we have demonstrated how to set up the discretized forward acoustic wave equations and associated wave propagator with run-time code generation. While we limited our discussion to the constant density acoustic wave equation, Devito is capable of handling more general wave equations, but this is a topic beyond this tutorial on simulating waves for inversion. In [[Full-waveform inversion, Part 2: Adjoint modeling|Part 2]] of our tutorial, we will show how to calculate a valid gradient of the FWI objective using the adjoint-state method. In [[Full-waveform inversion, Part 3: Optimization|Part 3]], we will demonstrate how to set up a complete matrix-free and scalable optimization framework for acoustic FWI.
 +
 
 +
== Acknowledgments ==
 +
This research was carried out as part of the SINBAD II project with the support of the member organizations of the SINBAD Consortium. This work was financially supported in part by EPSRC grant EP/L000407/1 and the Imperial College London Intel Parallel Computing Centre.
 +
 
 +
==Related links==
 +
* [[Full-waveform inversion, Part 2: Adjoint modeling]]
 +
* [[Full-waveform inversion, Part 3: Optimization]]
  
 
== References ==
 
== References ==
 
{{reflist}}
 
{{reflist}}
  
== Corresponding author ==
+
== Corresponding authors ==
* Corresponding author: Seogi Kang, University of British Columbia Geophysical Inversion Facility, {{Email|skang|eos.ubc.ca}}
+
* Corresponding author: Mathias Louboutin, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, {{Email|mloubout|eoas.ubc.ca}}
 +
* Philipp Witte, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia
 +
* Michael Lange, Imperial College London, London, UK
 +
* Navjot Kukreja, Imperial College London, London, UK
 +
* Fabio Luporini, Imperial College London, London, UK
 +
* Gerard Gorman, Imperial College London, London, UK
 +
* Felix J. Herrmann, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, now at Georgia Institute of Technology, USA
  
 
==External links==
 
==External links==
 
{{search}}
 
{{search}}
[[:Category:Tutorials]]
+
 
 +
[[Category:Tutorials]]

Latest revision as of 08:27, 26 February 2018

Since its re-introduction by Pratt (1999)[1], full-waveform inversion (FWI) has gained a lot of attention in geophysical exploration because of its ability to build high resolution velocity models more or less automatically in areas of complex geology. While there is an extensive and growing literature on the topic, publications focus mostly on technical aspects, making this topic inaccessible for a broader audience due to the lack of simple introductory resources for newcomers to geophysics. We will accomplish this by providing a hands-on walkthrough of FWI using Devito, a system based on domain-specific languages that automatically generates code for time-domain finite-differences.[2]

As usual, this tutorial is accompanied by all the code you need to reproduce the figures. Go to http://github.com/seg/tutorials-2017 and follow the links. In the Notebook, we describe how to simulate synthetic data for a specified source and receiver setup and how to save the corresponding wavefields and shot records. In Part 2 of this series, we will address how to calculate model updates, i.e. gradients of the FWI objective function, via adjoint modeling. Finally, in Part 3 we will demonstrate how to use this gradient as part of an optimization framework for inverting an unknown velocity model.

Introduction

Devito provides a concise and straightforward computational framework for discretizing wave equations, which underlie all FWI frameworks. We will show that it generates verifiable executable code at run time for wave propagators associated with forward and (in Part 2) adjoint wave equations. Devito frees the user from the recurrent and time-consuming development of performant time-stepping codes and allows the user to concentrate on the geophysics of the problem rather than on low-level implementation details of wave-equation simulators. This tutorial covers the conventional adjoint-state formulation of full-waveform tomography[3] that underlies most of the current methods referred to as full-waveform inversion.[4] While other formulations have been developed to improve the convergence of FWI for poor starting models, in these tutorials we will concentrate on the standard formulation that relies on the combination of a forward/adjoint pair of propagators and a correlation-based gradient. In part one of this tutorial, we discuss how to set up wave simulations for inversion, including how to express the wave equation in Devito symbolically and how to deal with the acquisition geometry.

What is FWI?

FWI tries to iteratively minimize the difference between data that was acquired in a seismic survey and synthetic data that is generated from a wave simulator with an estimated (velocity) model of the subsurface. As such, each FWI framework essentially consists of a wave simulator for forward modeling the predicted data and an adjoint simulator for calculating a model update from the data misfit. This first part of this tutorial is dedicated to the forward modeling part and demonstrates how to discretize and implement the acoustic wave equation using Devito.

Wave simulations for inversion

The acoustic wave equation with the squared slowness , defined as with being the unknown spatially varying wavespeed, is given by:


(1)

where is the Laplace operator, is the seismic source, located at and is a space-dependent dampening parameter for the absorbing boundary layer.[5] As shown in Figure 1, the physical model is extended in every direction by nbpml grid points to mimic an infinite domain. The dampening term attenuates the waves in the dampening layer and prevents waves from reflecting at the model boundaries. In Devito, the discrete representations of and are contained in a model object that contains a grid object with all relevant information such as the origin of the coordinate system, grid spacing, size of the model and dimensions time, x, y:

model = Model(vp=v,              # A velocity model.
              origin=(0, 0),     # Top left corner.
              shape=(101, 101),  # Number of grid points.
              spacing=(10, 10),  # Grid spacing in m.
              nbpml=40)          # boundary layer.
Figure 1: (a) Diagram showing the model domain, with the perfectly matched layer (PML) as an absorbing layer to attenuate the wavefield at the model boundary. (b) The example model used in this tutorial, with the source and receivers indicated. The grid lines show the cell boundaries.

In the Model instantiation, vp is the velocity in km/s, origin is the origin of the physical model in meters, spacing is the discrete grid spacing in meters, shape is the number of grid points in each dimension and nbpml is the number of grid points in the absorbing boundary layer. Is is important to note that shape is the size of the physical domain only, while the total number of grid points, including the absorbing boundary layer, will be automatically derived from shape and nbpml.

Symbolic definition of the wave propagator

To model seismic data by solving the acoustic wave equation, the first necessary step is to discretize this partial differential equation (PDE), which includes discrete representations of the velocity model and wavefields, as well as approximations of the spatial and temporal derivatives using finite-differences (FD). Unfortunately, implementing these finite-difference schemes in low-level code by hand is error prone, especially when we want performant and reliable code. The primary design objective of Devito is to allow users to define complex matrix-free finite-difference approximations from high-level symbolic definitions, while employing automated code generation to create highly optimized low-level C code. Using the symbolic algebra package SymPy to facilitate the automatic creation of derivative expressions, Devito generates computationally efficient wave propagators.[6]

At the core of Devito's symbolic API are symbolic types that behave like SymPy function objects, while also managing data:

  • Function objects represent a spatially varying function discretized on a regular Cartesian grid. For example, a function symbol f = Function(name='f', grid=model.grid, space_order=2) is denoted symbolically as f(x, y). The objects provide auto-generated symbolic expressions for finite-difference derivatives through shorthand expressions like f.dx and f.dx2 for the first and second derivative in x.
  • TimeFunction objects represent a time-dependent function that has time as the leading dimension, for example g(time, x, y). In addition to spatial derivatives TimeFunction symbols also provide time derivatives g.dt and g.dt2.
  • SparseFunction objects represent sparse components, such as sources and receivers, which are usually distributed sparsely and often located off the computational grid — these objects also therefore handle interpolation onto the model grid.

To demonstrate Devito's symbolic capabilities, let us consider a time-dependent function u(time,x,y) representing the discrete forward wavefield:

u = TimeFunction(name="u", grid=model.grid, 
                 time_order=2, space_order=2,
                 save=True, time_dim=nt)

where the grid object provided by the model defines the size of the allocated memory region, time_order and space_order define the default discretization order of the derived derivative expressions.

We can now use this symbolic representation of our wavefield to generate simple discretized expressions for finite-difference derivative approximations using shorthand expressions, such as u.dt and u.dt2 to denote du/dt and (d^2 u)/(dt^2 ) respectively:

>>> u.dt
-u(time - dt, x, y)/(2*dt) + u(time + dt, x, y)/(2*dt)
>>> u.dt2
-2*u(time, x, y)/dt**2 + u(time - dt, x, y)/dt**2 + u(time + dt, x, y)/dt**2

Using the automatic derivation of derivative expressions, we can now implement a discretized expression for Equation 1 without the source term q(x,y,t;x_s,y_s). The model object, which we created earlier, already contains the squared discrete slowness model.m and damping term model.damp as Function objects:

pde = model.m * u.dt2 - u.laplace + model.damp * u.dt

If we write out the (second order) second time derivative u.dt2 as shown earlier and ignore the damping term for the moment, our pde expression translates to the following discrete the wave equation:


(2)

with time being the current time step and dt being the time stepping interval. To propagate the wavefield, we rearrange to obtain an expression for the wavefield u(time+dt) at the next time step. Ignoring the damping term once again, this yields:


(3)

We can rearrange our pde expression automatically using the SymPy utility function solve, then create an expression which defines the update of the wavefield for the new time step u(time+dt), with the command u.forward:

stencil = Eq(u.forward, solve(pde, u.forward)[0])

stencil represents the finite-difference approximation derived from Equation 3, including the finite-difference approximation of the Laplacian and the damping term.

Although it defines the update for a single time step only, Devito knows that we will be solving a time-dependent problem over a number of time steps because the wavefield u is a TimeFunction object.

Setting up the acquisition geometry

The expression for time stepping we derived in the previous section does not contain a seismic source function yet, so the update for the wavefield at a new time step is solely defined by the two previous wavefields. However as indicated in Equation 1, wavefields for seismic experiments are often excited by an active (impulsive) source q(x,y,t;x_s), which is a function of space and time (just like the wavefield u). To include such a source term in our modeling scheme, we simply add the the source wavefield as an additional term to Equation 3:


(4)

Because the source appears on the right-hand side in the original equation (Equation 1), the term also needs to be multiplied with dt^2/m (this follows from rearranging Equation 2, with the source on the right-hand side in place of 0). Unlike the discrete wavefield u however, the source q is typically localized in space and only a function of time, which means the time-dependent source wavelet is injected into the propagating wavefield at a specified source location. The same applies when we sample the wavefield at receiver locations to simulate a shot record, i.e. the simulated wavefield needs to be sampled at specified receiver locations only. Source and receiver both do not necessarily coincide with the modeling grid.

Here, RickerSource acts as a wrapper around SparseFunction and models a Ricker wavelet with a peak frequency f0 and source coordinates src_coords:

f0 = 0.010  # kHz, peak frequency.
src = RickerSource(name='src', grid=model.grid, f0=f0,
                   time=time, coordinates=src_coords)

The src.inject function now injects the current time sample of the Ricker wavelet (weighted with dt^2/m as shown in Equation 4) into the updated wavefield u.forward at the specified coordinates.

src_term = src.inject(field=u.forward,
                      expr=src * dt**2 / model.m,
                      offset=model.nbpml)

To extract the wavefield at a predetermined set of receiver locations, there is a corresponding wrapper function for receivers as well, which creates a SparseFunction object for a given number npoint of receivers, number nt of time samples, and specified receiver coordinates rec_coords:

rec = Receiver(name='rec', npoint=101, ntime=nt,
               grid=model.grid, coordinates=rec_coords)

Rather than injecting a function into the model as we did for the source, we now simply save the wavefield at the grid points that correspond to receiver positions and interpolate the data to their exact possibly of the computational grid location:

rec_term = rec.interpolate(u, offset=model.nbpml)

Forward simulation

We can now define our forward propagator by adding the source and receiver terms to our stencil object:

op_fwd = Operator([stencil] + src_term + rec_term)

The symbolic expressions used to create Operator contain sufficient meta-information for Devito to create a fully functional computational kernel. The dimension symbols contained in the symbolic function object (time, x, y) define the loop structure of the created code,while allowing Devito to automatically optimize the underlying loop structure to increase execution speed.

The size of the loops and spacing between grid points is inferred from the symbolic Function objects and associated model. grid object at run-time. As a result, we can invoke the generated kernel through a simple Python function call by supplying the number of time steps time and the time-step size dt. The user data associated with each Function is updated in-place during operator execution, allowing us to extract the final wavefield and shot record directly from the symbolic function objects without unwanted memory duplication:

op_fwd(time=nt, dt=model.critical_dt)

When this has finished running, the resulting wavefield is stored in u.data and the shot record is in rec.data. We can easily plot this 2D array as an image, as shown in Figure 2.

Figure 2. The shot record generated by Devito for the example velocity model. As demonstrated in the notebook, a movie of snapshots of the forward wavefield can also be generated by capturing the wavefield at discrete time steps. Figure 3 shows three timesteps from the movie.
Figure 3. Three time steps from the wavefield simulation that resulted in the shot record in Figure 2. You can generate an animated version in the Notebook at github.com/seg.

Conclusions

In this first part of the tutorial, we have demonstrated how to set up the discretized forward acoustic wave equations and associated wave propagator with run-time code generation. While we limited our discussion to the constant density acoustic wave equation, Devito is capable of handling more general wave equations, but this is a topic beyond this tutorial on simulating waves for inversion. In Part 2 of our tutorial, we will show how to calculate a valid gradient of the FWI objective using the adjoint-state method. In Part 3, we will demonstrate how to set up a complete matrix-free and scalable optimization framework for acoustic FWI.

Acknowledgments

This research was carried out as part of the SINBAD II project with the support of the member organizations of the SINBAD Consortium. This work was financially supported in part by EPSRC grant EP/L000407/1 and the Imperial College London Intel Parallel Computing Centre.

Related links

References

  1. Pratt, R. G., 1999, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model: GEOPHYSICS, 64, 888–901. http://dx.doi.org/10.1190/1.1444597
  2. Lange, M., Kukreja, N., Louboutin, M., Luporini, F., Zacarias, F. V., Pandolfo, V., Gorman, G., 2016, Devito: Towards a generic finite difference DSL using symbolic python: 6th workshop on python for high-performance and scientific computing. http://dx.doi.org/10.1109/PyHPC.2016.9
  3. Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: GEOPHYSICS, 49, 1259–1266. http://dx.doi.org/10.1190/1.1441754
  4. Virieux, J., and Operto, S., 2009, An overview of full-waveform inversion in exploration geophysics: GEOPHYSICS, 74, WCC1–WCC26. http://dx.doi.org/10.1190/1.3238367
  5. Cerjan, C., Kosloff, D., Kosloff, R., and Reshef, M., 1985, A nonreflecting boundary condition for discrete acoustic and elastic wave equations: GEOPHYSICS, 50, 705–708. http://dx.doi.org/10.1190/1.1441945
  6. Meurer A, Smith CP, Paprocki M, et al., 2017, SymPy: symbolic computing in Python. PeerJ Computer Science 3:e103 http://dx.doi.org/10.7717/peerj-cs.103

Corresponding authors

  • Corresponding author: Mathias Louboutin, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, mloubout‐at‐eoas.ubc.ca
  • Philipp Witte, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia
  • Michael Lange, Imperial College London, London, UK
  • Navjot Kukreja, Imperial College London, London, UK
  • Fabio Luporini, Imperial College London, London, UK
  • Gerard Gorman, Imperial College London, London, UK
  • Felix J. Herrmann, Seismic Laboratory for Imaging and Modeling (SLIM), The University of British Columbia, now at Georgia Institute of Technology, USA

External links

find literature about
Full-waveform inversion, Part 1: Forward modeling
SEG button search.png Datapages button.png GeoScienceWorld button.png OnePetro button.png Schlumberger button.png Google button.png AGI button.png