Math and tables
Given a continuous function x(t) of a single variable t, its Fourier transform is defined by the integral
where ω is the Fourier dual of the variable t. If t signifies time, then ω is angular frequency. The temporal frequency f is related to the angular frequency ω by ω = 2πf.
The Fourier transform is reversible; that is, given X(ω), the corresponding time function is
Throughout this book, the following sign convention is used for the Fourier transform. For the forward transform, the sign of the argument in the exponent is negative if the variable is time and positive if the variable is space. Of course, the inverse transform has the opposite sign used in the respective forward transform. For convenience, the scale factor 2π in equations (13) and (14) are omitted.
Generally, X(ω) is a complex function. By using the properties of the complex functions, X(ω) is expressed as two other functions of frequency
where A(ω) and ϕ(ω) are the amplitude and phase spectra, respectively. They are computed by the following equations:
where Xr(ω) and Xi(ω) are the real and imaginary parts of the Fourier transform X(ω). When X(ω) is expressed in terms of its real and imaginary components
and is compared with equation (15), note that
We now consider two functions — x(t) and f(t). Listed in Table A-1 are basic theorems that are useful in various applications of the Fourier transform.
|Operation||Time Domain||Frequency Domain|
|(1) Shifting||x(t − τ)||exp(−iωτ)X(ω)|
|(4) Addition||f(t) + x(t)||F(ω) + X(ω)|
|(5) Multiplication||f(t) x(t)||F(ω) * X(ω)|
|(6) Convolution||f(t) * x(t)||F(ω) X(ω)|
|(7) Autocorrelation||x(t) * x(−t)|
|(8) Parseval’s theorem|
|* denotes convolution.|
Proofs of these theorems can be found in the classic reference on Fourier transforms by Bracewell (1965). Also, some of the proofs are left to the exercises at the end of this chapter. Here, we shall derive the convolutional relation (6) for continuous functions, and the same relation for discrete functions in Section A.2. Consider convolution of two functions x(t) and f(t) with their Fourier transforms X(ω) and F(ω), respectively,
which is explicitly given by the integral
The Fourier transform of the resulting function y(t) is
and interchange the two integrals
From the shift theorem given by entry (1) of Table A-1, we have
Use this relation in equation (25) to get
then rearrange the terms to obtain
Note that the integral in equation (28) is the Fourier transform of x(t), and therefore,
which is the desired result given by entry (6) of Table A-1.
Math, code, and pictures
We are now ready to implement the neural network itself. Neural networks consist of three or more layers: an input layer, one or more hidden layers, and an output layer.
Let's implement a network with one hidden layer. The layers are as follows:
where is the i-th sample of the input data , and are the weight matrices and bias vectors for layers 1 and 2, respectively; and is our nonlinear function. Applying the nonlinearity to in layer 1 results in the activation . The output layer yields , the i-th estimate of the desired output. We're not going to apply the nonlinearity to the output, but people often do. The weights are randomly initialized, and the biases start at zero. During training they will be iteratively updated to encourage the network to converge on an optimal approximation to the expected output.
We'll start by defining the forward pass, using NumPy's @ operator for matrix multiplication:
def forward(xi, W1, b1, W2, b2): z1 = W1 @ xi + b1 a1 = sigma(z1) z2 = W2 @ a1 + b2 return z2, a1
Below is a picture of a neural network similar to the one we're building:
We see a simple neural network that takes three numbers as input (the green neurons) and outputs one number (the red neuron). In the middle (the orange neurons), we have a so-called hidden layer, which in this case has five neurons or units. Moving information from input layer, to hidden layer, to output layer is as simple as matrix multiplying and adding numbers. In the middle, we apply the sigmoid function to each of the numbers.
We can “teach” this simple system to model a mapping between one set of numbers and another set. For example, we can train this system to output a two when we input a one, a four when we input a two, and 2N when we input an N. This is equivalent to building a linear model. More interestingly, we could teach it to output a nonlinear model: one maps to one, two maps to four, and maps to . More interestingly still, we could teach it to combine multiple inputs into a single output.
In this tutorial, we'll train a model like this to learn the reflectivity for P–P reflections at an interface. (Normally we would use the Zoeppritz equation to do this — our only purpose here is to show that even a simple neural network can learn a nonlinear function. We wouldn't really want to compute the reflectivity this way.)
Instead of three inputs, we'll use seven: and for the upper and lower layer properties at each interface, plus the angle of incidence, , at each interface. And instead of five units in the hidden layer, we'll use 300.
How does the network learn? The short version is that we show the system a bunch of corresponding input/output pairs we want it to learn, and we show it these pairs many times. Every time we do so, we move the and in whatever direction will make the outputs of the network more similar to the known output we're trying to teach it.
This iterative adjustment of weights and biases relies on a process called back propagation of errors.
Back propagation is the critical piece of thinking that enabled the deep-learning revolution. It is the reason Google can find images of flowers, or translate from Hindi to English. It is the reason we can predict the failure of drilling equipment days in advance of failure (see my video at http://bit.ly/2Ks5tQf for more on this).
Here is the back-propagation algorithm we'll employ:
For each training example:
For each layer:
- Calculate the error.
- Calculate weight gradient.
- Update weights.
- Calculate the bias gradient.
- Update biases.
This is straightforward for the output layer. However, to calculate the gradient at the hidden layer, we need to compute the gradient of the error with respect to the weights and biases of the hidden layer. That's why we needed the derivative in the
Let's implement the inner loop as a Python function:
def backward(xi, yi, a1, z2, params, learning_rate): err_output = z2 - yi grad_W2 = err_output * a1 params['W2'] -= learning_rate * grad_W2 grad_b2 = err_output params['b2'] -= learning_rate * grad_b2 derivative = sigma(a1, forward=False) err_hidden = err_output * derivative * params['W2'] grad_W1 = err_hidden[:, None] @ xi[None, :] params['W1'] -= learning_rate * grad_W1 grad_b1 = err_hidden params['b1'] -= learning_rate * grad_b1 return params
To demonstrate this back-propagation workflow, and thus that our system can learn, let's try to get the above neural network to learn the Zoeppritz equation. We're going to need some data.
- Bracewell, R. N., 1965, The Fourier transform and its applications: McGraw-Hill Book Co.