The function of interpolation

From SEG Wiki
Jump to: navigation, search

When interpreting seismic or well data in desktop software, it’s easy to forget that the data are discretely sampled in space and time. Even when you look at an apparently continuous log curve, there are really only samples, typically every 0.1524 m (6 inches). When you display seismic data as wiggle traces, there are perhaps samples only every 2 or 4 ms in the time dimension. Some software lets you turn off the interpolation so you can see the discrete data samples, as in Figure 1a.

So what happens when we want to read the amplitude from a seismic trace, but we are between two samples? For example, maybe we want to know the amplitude at the exact time where the seabed horizon crosses the trace. Then, the software has to either select the nearest sample or perform some kind of interpolation, perhaps linear interpolation or by fitting some kind of curve to the data. Not surprisingly, the method you choose affects the result you get.

Figure 1. (a) A close-up of the seafloor reflector and interpreted horizon. The seismic samples are like the pixels of a raster image. The horizon does not have the same degree of time quantization and is much smoother. At the blue trace, the event has a time of 45.4 samples. (b) An amplitude extraction along the red horizon. The results from three different interpolation methods are shown, along with computation time in ms, and the root-mean-square error compared to the solution given by the cubic spline fit.

I’ve drawn a fictitious trace in Figure 2. The black dots are the samples actually stored in the data. The black curve is the cubic spline interpolation of the data points. This is a piecewise third-order polynomial function that fits the data points, called “knots.” It’s what we might think of as the most natural interpolation, but it is computationally expensive to fit. The blue curve is the linear interpolation and matches quite well on most limbs of the curve. In the accompanying Jupyter Notebook, you can see some speed tests on this and the other examples in this tutorial — in this case the linear interpolation is about 3.5 times faster than the spline fitting. Finally, the green curve is a nearest neighbor estimate — only a little faster than the linear interpolation to compute, but clearly quite inaccurate and with abrupt variations.

It’s all very well playing with artificial data, but we’d really like to apply what we’ve learned to real data. Suppose we wish to extract amplitude along a horizon from a 3D seismic volume. Without our knowledge of interpolation, we typically might have simply indexed into the volume, taking the nearest sample to the one we need. This is equivalent to sampling from the green curve in Figure 2. As we saw, it’s fast, but not very accurate. Let’s instead create an interpolation and sample from that instead.

Figure 2. A seismic trace in unitless time with discrete amplitude samples (black dots). Three different interpolations are shown as continuous lines. The “horizon” at t = 7.544 is at the maximum absolute amplitude of the trace.

The interpolation function

Most scientific computing frameworks make it easy to create interpolation functions. The core Python scientific library, scipy, has various methods in the scipy.interpolate module, which is a Python wrapper for the Fortran library FITPACK by Alan Cline of UT Austin.

Given a trace — an array of amplitudes — I’d like to be able to look up arbitrary times. First I have to make a timescale; let’s say these 10 samples occur every 2 ms, starting at zero:

>>> amps = np.array([0,1,3,0,-2,-5,-2,2,5,0])

>>> time = np.linspace(0, 18, amps.size)

>>> print(time)

[ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18.]

Now I can create an interpolator using SciPy's interp1d method:

>>> from scipy.interpolate import interp1d

>>> f = interp1d(time, amps, kind='linear')

The interp1d function returns another function, an interpolation function I’ve called f. I can use f to look up the amplitude at arbitrary times, say 3.141. Notice that this time occurs between the second and third amplitude values, which are 1 and 3 respectively. Intuitively, given the slope of 1 in this interval, the answer should be 2.141:

>>> print(f(3.141))

2.141

It works! Now I can keep my discretely sampled points, but look up any value in between them, using various interpolation methods: nearest, linear, and any order of spline curve from zero (which is a bit like nearest but instead returns the previous sample), to first order (which is the same as linear), to second order (or quadratic), to third order (cubic), and beyond.

The result of applying three of these methods to the horizon in Figure 1a is shown in Figure 1b, along with computation time and RMS error. As expected, the linear solution is much faster than the cubic spline but comes with a substantial error. Interestingly, the error on the nearest method is similar to that on the linear interpolation — a glance at t = 13.5 shows why this is the case for a horizon right on a reflector.

Beyond 1D interpolation

There are occasions when we would like to be able to treat the seismic data volume as a continuous basis in every dimension, not just in time. For example, in order to find the seismic amplitude along a wellbore, we would need a 3D interpolation function. Again, scipy.interpolate has a ready-made way to do this, called RegularGridInterpolator. It works exactly like interp1d, in that we instantiate it with linear scales representing the dimensions of the data we want to sample, and we get back an interpolating function, which I'll call g:

>>> # The x dimension goes 0-12 km in 481 xlines

>>> x = np.linspace(0, 12000, 481) # m

>>> y = np.linspace(0, 7500, 601) # m

>>> t = np.linspace(0, 1000, 251) # ms

>>> g = RegularGridInterpolator((x, y, t), seismic)

Now I can get the amplitude from anywhere in the volume seismic, using real-world units, linearly interpolated from the local samples. To get the amplitude at x = 123, y = 456, t = 789:

>>> amplitude = g([123, 456, 789])

array([-852.9604])

Notice that the result is an array — hinting that I can pass any number of points into this interpolator. Have a look at the accompanying notebook at [1] to see the result for a wellbore.

Now, if I can settle for linear interpolation, I can even use this interpolator instead of my trace-by-trace method, making it much easier to get at the amplitude for an entire horizon, as in Figure 3. Because it was implemented by someone very clever, it’s much faster too: the full horizon takes about 50 ms on my machine, compared to over 20 s for looping through 1D interpolations.

Figure 3. The horizon slice at the seabed, computed with linear sample interpolation.

Further reading

You can find all the data and code for this tutorial at [2]. And you can read all about scipy.interpolate at [3]. I explored some of these ideas in a series of blog posts, starting with An attribute analysis primer in June 2015, at [4].

Acknowledgments

Thanks are due to Alessandro Amato del Monte for the inspiration to dig further into interpolation methods in SciPy. Thanks also to Evan Bianco for improvements to the manuscript and notebook.

Corresponding author