# User:Ageary/Testing page

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Notable works Seismic migration problems and solutions, GEOPHYSICS 66(5):1622 2016 SEG Honorary Membership, 2001 SEG Life Membership Award, 2001 Honorable Mention (Geophysics) Geophysics, Texas A&M Geophysics, Stanford University

## Math and tables

Given a continuous function x(t) of a single variable t, its Fourier transform is defined by the integral

 ${\displaystyle X(\omega )=\int _{-\infty }^{+\infty }x(t)\;\exp(-i\omega t)\;dt,}$ (13)

where ω is the Fourier dual of the variable t. If t signifies time, then ω is angular frequency. The temporal frequency f is related to the angular frequency ω by ω = 2πf.

The Fourier transform is reversible; that is, given X(ω), the corresponding time function is

 ${\displaystyle x(t)=\int _{-\infty }^{+\infty }X(\omega )\;\exp(i\omega t)\;d\omega .}$ (14)

Throughout this book, the following sign convention is used for the Fourier transform. For the forward transform, the sign of the argument in the exponent is negative if the variable is time and positive if the variable is space. Of course, the inverse transform has the opposite sign used in the respective forward transform. For convenience, the scale factor 2π in equations (13) and (14) are omitted.

Generally, X(ω) is a complex function. By using the properties of the complex functions, X(ω) is expressed as two other functions of frequency

 ${\displaystyle X(\omega )=A(\omega )\;\;\exp[i\phi (\omega )],}$ (15)

where A(ω) and ϕ(ω) are the amplitude and phase spectra, respectively. They are computed by the following equations:

 ${\displaystyle A(\omega )={\sqrt {X_{r}^{2}(\omega )+X_{i}^{2}(\omega )}}}$ (16)

and

 ${\displaystyle \phi (\omega )=\tan ^{-1}{\frac {X_{i}(\omega )}{X_{r}(\omega )}},}$ (17)

where Xr(ω) and Xi(ω) are the real and imaginary parts of the Fourier transform X(ω). When X(ω) is expressed in terms of its real and imaginary components

 ${\displaystyle X(\omega )=X_{r}(\omega )+iX_{i}(\omega ),}$ (18)

and is compared with equation (15), note that

 ${\displaystyle {X_{r}}(\omega )=A(\omega )\;\cos \phi (\omega ),}$ (19)

and

 ${\displaystyle {X_{i}}(\omega )=A(\omega )\;\sin \phi (\omega ).}$ (20)

We now consider two functions — x(t) and f(t). Listed in Table A-1 are basic theorems that are useful in various applications of the Fourier transform.

 Operation Time Domain Frequency Domain (1) Shifting x(t − τ) exp(−iωτ)X(ω) (2) Scaling x(at) ${\displaystyle {{\left|a\right|}^{-1}}X\left({\omega }/{a}\;\right)}$ (3) Differentiation dx(t)/dt iωX(ω) (4) Addition f(t) + x(t) F(ω) + X(ω) (5) Multiplication f(t) x(t) F(ω) * X(ω) (6) Convolution f(t) * x(t) F(ω) X(ω) (7) Autocorrelation x(t) * x(−t) ${\displaystyle {{\left|X\left(\omega \right)\right|}^{2}}}$ (8) Parseval’s theorem ${\displaystyle \int {{\left|x\left(t\right)\right|}^{2}}\ dt}$ ${\displaystyle \int {{\left|X\left(\omega \right)\right|}^{2}}\ d\omega }$
 * denotes convolution.

Proofs of these theorems can be found in the classic reference on Fourier transforms by Bracewell (1965)[1]. Also, some of the proofs are left to the exercises at the end of this chapter. Here, we shall derive the convolutional relation (6) for continuous functions, and the same relation for discrete functions in Section A.2. Consider convolution of two functions x(t) and f(t) with their Fourier transforms X(ω) and F(ω), respectively,

 ${\displaystyle y(t)=f(t)\ast x(t),}$ (21)

which is explicitly given by the integral

 ${\displaystyle y(t)=\int _{-\infty }^{+\infty }f(t-t')\;x(t')\;dt'.}$ (22)

The Fourier transform of the resulting function y(t) is

 ${\displaystyle Y(\omega )=\int _{-\infty }^{+\infty }y(t)\;\;\exp(-i\omega t)\;dt.}$ (23)

Substitute the convolution integral of equation (22) into equation (23)

 ${\displaystyle Y(\omega )=\int _{-\infty }^{+\infty }{\left[{\int _{-\infty }^{+\infty }{f\left({t-t'}\right)x\left({t'}\right)\;dt'}}\right]}\quad \exp \left({-i\omega t}\right)\;dt,}$ (24)

and interchange the two integrals

 ${\displaystyle Y(\omega )=\int _{-\infty }^{+\infty }{x\left({t'}\right)}\left[{\int _{-\infty }^{+\infty }{f\left({t-t'}\right)\;\;}\exp \left({-i\omega t}\right)\;dt}\right]\;dt'.}$ (25)

From the shift theorem given by entry (1) of Table A-1, we have

 ${\displaystyle \int _{-\infty }^{+\infty }f(t-t')\;\;\exp(-i\omega t)\;dt=F(\omega )\;\;\exp(-i\omega t').}$ (26)

Use this relation in equation (25) to get

 ${\displaystyle Y(\omega )=\int _{-\infty }^{+\infty }x(t')\;[F(\omega )\;\exp(-i\omega t')]\;dt',}$ (27)

then rearrange the terms to obtain

 ${\displaystyle Y(\omega )=F(\omega )\int _{-\infty }^{+\infty }x(t')\;\exp(-i\omega t')\;dt'.}$ (28)

Note that the integral in equation (28) is the Fourier transform of x(t), and therefore,

 ${\displaystyle Y(\omega )=F(\omega )\;X(\omega ),}$ (29)

which is the desired result given by entry (6) of Table A-1.

## Math, code, and pictures

We are now ready to implement the neural network itself. Neural networks consist of three or more layers: an input layer, one or more hidden layers, and an output layer.

Let's implement a network with one hidden layer. The layers are as follows:

Input layer: ${\displaystyle x^{i}}$

Hidden layer: ${\displaystyle a_{1}^{(i)}=\sigma (W_{1}x^{(i)}+b_{1})}$

Output layer: ${\displaystyle {\hat {y}}^{(i)}=W_{2}a_{1}^{(i)}+b_{2}}$

where ${\displaystyle x^{(i)}}$ is the i-th sample of the input data ${\displaystyle X;W_{1},b_{1},W_{2}}$, and ${\displaystyle b_{2}}$ are the weight matrices and bias vectors for layers 1 and 2, respectively; and ${\displaystyle \sigma }$ is our nonlinear function. Applying the nonlinearity to ${\displaystyle W_{1}x^{(i)}+b_{1}}$ in layer 1 results in the activation ${\displaystyle a_{1}}$. The output layer yields ${\displaystyle {\hat {y}}^{(i)}}$, the i-th estimate of the desired output. We're not going to apply the nonlinearity to the output, but people often do. The weights are randomly initialized, and the biases start at zero. During training they will be iteratively updated to encourage the network to converge on an optimal approximation to the expected output.

We'll start by defining the forward pass, using NumPy's @ operator for matrix multiplication:

def forward(xi, W1, b1, W2, b2):
z1 = W1 @ xi + b1
a1 = sigma(z1)
z2 = W2 @ a1 + b2
return z2, a1


Below is a picture of a neural network similar to the one we're building:

We see a simple neural network that takes three numbers as input (the green neurons) and outputs one number (the red neuron). In the middle (the orange neurons), we have a so-called hidden layer, which in this case has five neurons or units. Moving information from input layer, to hidden layer, to output layer is as simple as matrix multiplying and adding numbers. In the middle, we apply the sigmoid function to each of the numbers.

We can “teach” this simple system to model a mapping between one set of numbers and another set. For example, we can train this system to output a two when we input a one, a four when we input a two, and 2N when we input an N. This is equivalent to building a linear model. More interestingly, we could teach it to output a nonlinear model: one maps to one, two maps to four, and ${\displaystyle N}$ maps to ${\displaystyle N^{2}}$. More interestingly still, we could teach it to combine multiple inputs into a single output.

In this tutorial, we'll train a model like this to learn the reflectivity for P–P reflections at an interface. (Normally we would use the Zoeppritz equation to do this — our only purpose here is to show that even a simple neural network can learn a nonlinear function. We wouldn't really want to compute the reflectivity this way.)

Instead of three inputs, we'll use seven: ${\displaystyle V_{P},V_{S},}$ and ${\displaystyle \rho }$ for the upper and lower layer properties at each interface, plus the angle of incidence, ${\displaystyle \theta }$, at each interface. And instead of five units in the hidden layer, we'll use 300.

How does the network learn? The short version is that we show the system a bunch of corresponding input/output pairs we want it to learn, and we show it these pairs many times. Every time we do so, we move the ${\displaystyle W's}$ and ${\displaystyle b's}$ in whatever direction will make the outputs of the network more similar to the known output we're trying to teach it.

This iterative adjustment of weights and biases relies on a process called back propagation of errors.

Back propagation is the critical piece of thinking that enabled the deep-learning revolution. It is the reason Google can find images of flowers, or translate from Hindi to English. It is the reason we can predict the failure of drilling equipment days in advance of failure (see my video at http://bit.ly/2Ks5tQf for more on this).

Here is the back-propagation algorithm we'll employ:

For each training example:

For each layer:

• Calculate the error.
• Update weights.
• Update biases.

This is straightforward for the output layer. However, to calculate the gradient at the hidden layer, we need to compute the gradient of the error with respect to the weights and biases of the hidden layer. That's why we needed the derivative in the forward() function.

Let's implement the inner loop as a Python function:

def backward(xi, yi,
a1, z2,
params,
learning_rate):

err_output = z2 - yi

derivative = sigma(a1, forward=False)
err_hidden = err_output * derivative * params['W2']
grad_W1 = err_hidden[:, None] @ xi[None, :]