# Weighted least-squares

Series Geophysical References Series Problems in Exploration Seismology and their Solutions Lloyd P. Geldart and Robert E. Sheriff 9 295 - 366 http://dx.doi.org/10.1190/1.9781560801733 ISBN 9781560801153 SEG Online Store

## Problem 9.33

Find “best-fit” straight lines to the data in Table 9.33a.

 $x\to$ 0.21 0.49 0.71 1 1.42 1.73 2.03 2.47 $t\to$ 0.51 1.31 1.54 2.58 1.79 2.2 2.76 2.72 $x\to$ 3.05 3.09 3.28 3.64 3.7 3.84 4.07 4.24 $t\to$ 4.42 3.25 3.07 3.5 3.73 3.63 3.87 3.88
1. First plot the data and determine by eye a best-fit line,
2. Second, find the unweighted best-fit line by least-squares (i.e., weights of 1)
3. Find the least-squares best-fit line by weighting according to the vertical distances from the line in (a), and finally
4. By discarding the three wildest points (weighting them zero)

### Background

To fit a straight line $t=ax+b$ to a data set such as that in Table 9.33a, we can find the constants $a$ and $b$ such that the sum of the squares of the “errors” is minimized (see also problem 9.22). An error is the difference between an observed point and that predicted by the equation. If we wish to give added weight to some data points, usually because we consider them more reliable than other values, we give the error squared the weight $w_{i}$ as in equation (9.33a). Then we write the sum of the errors squared $E$ as

 {\begin{aligned}E=\mathop {\sum } \limits _{i}^{}w_{i}[(ax_{i}+b)-t_{i}]^{2},\end{aligned}} (9.33a)

and minimize $E$ by varying $a$ and $b$ . This gives these equations:

 {\begin{aligned}{\frac {\partial E}{\partial a}}=\mathop {\sum } \limits _{i}^{}w_{i}x_{i}\left[(ax_{i}+b)-t_{i}\right]=0,\end{aligned}} (9.33b)

 {\begin{aligned}{\frac {\partial E}{\partial b}}=\mathop {\sum } \limits _{i}^{}w_{i}\left[(ax_{i}+b)-t_{i}\right]=0.\end{aligned}} (9.33c)

We rewrite these as simultaneous equations to be solved for $a$ and $b$ :

 {\begin{aligned}a\mathop {\sum } \limits _{i}^{}w_{i}x_{i}^{2}+b\mathop {\sum } \limits _{i}^{}w_{i}x_{i}=\mathop {\sum } \limits _{i}^{}w_{i^{x}i^{t}i},\end{aligned}} (9.33d)

 {\begin{aligned}a\mathop {\sum } \limits _{i}^{}w_{i}x_{i}+b\mathop {\sum } \limits _{i}^{}w_{i}\;=\mathop {\sum } \limits _{i}^{}w_{i}t_{i},\end{aligned}} (9.33e)

where $\mathop {\sum } \limits {_{i}}w_{i}={\hbox{sum of the weights}}$ .

Curves other than a straight line can be fit to data sets in a similar manner. Other definitions of “best fit” can also be used. Additional constraints, for example, that the curve should pass through the origin, can also be added.

### Solution

The data are plotted in Figure 9.33a and the calculations given in Table 9.33c. The best-fit line determined by eye is shown by the dashed line; its equation is.

{\begin{aligned}t=1.00+0.71x.\quad \mathrm {Eye-ball\ fit} .\end{aligned}} The line for equal weighting shown by the solid line has the equation.

{\begin{aligned}t=1.040+0.721x.\quad \mathrm {Equal\ weighting} \quad w_{b}.\end{aligned}} The line giving increased weighting to data that lie closer to the equal-weighting line is shown by short dashes; its equation is

{\begin{aligned}t=0.989+0.700x.\quad \mathrm {Weighting\ by\ proximity\ to\ eye-ball\ line} \ w_{c}.\end{aligned}} If we simply throw away the three points that lie farthest away, ($w_{d}=0$ in Table 9.33b) we get the equation (not plotted)

{\begin{aligned}t=1.041+0.683x.\quad \mathrm {Discarding\ three\ wild\ points,\ weights} \quad w_{d}.\end{aligned}} $x_{i}$ $t_{i}$ $x_{2}^{i}$ $x_{i}t_{i}$ $w_{b}$ $w_{c}$ $w_{d}$ 0.21 0.51 0.04 0.11 1 1 0 0.49 1.31 0.24 0.64 1 5 1 0.71 1.54 0.50 1.09 1 5 1 1.00 2.58 1.00 2.58 1 1 0 1.42 1.79 2.02 2.54 1 3 1 1.73 2.20 2.99 3.81 1 5 1 2.03 2.76 4.12 5.60 1 2 1 2.47 2.72 6.10 6.72 1 5 1 3.05 4.42 9.30 13.48 1 1 0 3.09 3.25 9.55 10.04 1 4 1 3.28 3.07 10.76 10.07 1 3 1 3.64 3.50 13.25 12.74 1 4 1 3.70 3.73 13.69 13.80 1 3 1 3.84 3.63 14.75 13.94 1 4 1 4.07 3.87 16.56 15.75 1 5 1 4.24 3.88 17.98 16.45 1 4 1 $Sums_{b}$ 38.97 44.76 123.13 129.37 16 13 $Sums_{c}$ 140.11 152.41 452.21 454.91 55

The changes in values are $<5\%$ (standard deviation 3%) and the different weighting schemes make relatively little difference in this example.

Eye-ball fit

$b=1.00,\;a=0.71$ Equations for equal weighting line: $123.13a_{b}+38.97b_{b}=129.37;$ $38,97a_{b}+16b_{b}=44.76;$ $b_{b}=1.040;\quad a_{b}=0.721$ . Weighting by proximity to above line: $452.21a_{c}+140.11b_{c}=454.91,$ $140.11a_{c}+55b_{c}=152.41,$ $b_{c}=0.989,\quad a_{c}=0.700$ . Throwing away 3 wild points: $112.79a_{d}+34.71b_{d}=113.20,$ $34.71a_{d}+13b_{d}=37.25$ $b_{d}=1.041,\quad a_{d}=0.683$ .