Theory, interactive quizzes & live code#

This page shows an example of how to combine theory, quizzes and code on one page. With headings and boxes, the interactive parts can be indicated (for now only if part of one cell). Automatic execution has been activated and a custom ipynb.file has replaced the download button.

Weighted least-squares estimation#

In ordinary least-squares estimation, we assume that all observations are equally important. In many cases this is not realistic, as observations may be obtained by different measurement systems, or under different circumstances. We want our methodology for least-squares estimation to be able to take this into account.

We achieve this goal by introducing a weight matrix in the minimization problem:

\[ \underset{\mathrm{x}}{\min} {\mathrm{(y-Ax)^T W(y-Ax)}} \]

In the unweighted least-squares approach, we arrive at the normal equation by pre-multiplying both sides of \(\mathrm{y=Ax}\) with the transpose of the design matrix \(\mathrm{A^T}\):

\[ \mathrm{y=Ax} \; \rightarrow \; \mathrm{A^T\; y = A^T\; A x } \]

In the weighted least-squares approach, we now need to add weight matrix \(W\) to this pre-multiplication factor, i.e., \( \mathrm{A^T W}\), to obtain the normal equation:

\[ \mathrm{y=Ax} \; \rightarrow \; \mathrm{A^T W \; y = A^TW \; A x} \]

The normal matrix is now defined as \(\mathrm{N=A^T W A}\). From this, assuming that the normal matrix \(N\) is invertible (non-singular) we find the weighted least-squares estimate \( \mathrm{\hat{x}} \),

\[\begin{split} \begin{align*} \mathrm{\hat{x}} &= \mathrm{(A^T W A)^{-1} A^T W y} \\ &= \arg \underset{\mathrm{x}}{\min} {\mathrm{(y-Ax)^T W(y-Ax)}} \end{align*} \end{split}\]

We also find the derived estimate \( \mathrm{\hat{y}} \) and \( \mathrm{\hat{\epsilon}} \):

\[ \mathrm{\hat{y} = A \hat{x} = A (A^T W A )^{-1} A^T W y} \]
\[ \mathrm{\hat{\epsilon} = y - \hat{y}= y - A \hat{x} = y-A (A^T W A )^{-1} A^T W y = (I- A(A^T W A )^{-1} A^T W) y} \]

Quiz question#

Video#

Discussion on the weight matrix#

The weight matrix \(\mathrm{W}\) expresses the (relative) weights between the observations. It is always a square matrix. The size of the weight matrix depends on the number of observations, \(m\). The size of the weight matrix is \(m\times m\).

If it is a unit matrix (\(\mathrm{W=I}\)), this implies that all observations have equal weight. Note that in this case the equations are equal to the ordinary least-squares solution.

If it is a diagonal matrix, with different values on the diagonal, this implies that entries with a higher value correspond to the observations which are considered to be of more importance. If the weight matrix has non-zero elements on the off-diagonal positions, this implies that (some) observations are correlated.

Weighted least-squares estimator: properties#

Until now, we have looked at the weighted least-squares solution of a single realization of the observations, where generally we assume that it is a realization of the random observable vector \(Y\), since measurements are affected by random errors. As such it follows the the weighted least-squares estimator is given by:

\[ \hat{X} = \mathrm{(A^T W A )^{-1} A^T W} Y \]

This estimator has two important properties: it is linear and unbiased.

The linearity property is due to the fact that \(\hat{X}\) is a linear function of the observables \(Y\).

The unbiased property means that the expectation of \(\hat{X}\) is equal to the true (but unknown) \(\mathrm{x}\). This can be shown as follows:

\[ \mathbb{E}(\hat{X}) = \mathrm{(A^T W A )^{-1} A^T W} \mathbb{E}(Y) = \mathrm{(A^T W A )^{-1} A^T W Ax = x} \]

This a very desirable property. It applies that if we would repeat the measurements many times to obtain a new estimate, the average of the estimated values would be equal to the truy values.

Exercise#

You have a time series of 8 measurements and fit a model assuming a linear trend (constant velocity).

Times of observations, observed values and number of observations are given as the variables t, y and m, as well as numpy and matplotlib.pyplot abbreviated as np and plt

# this will print all float arrays with 3 decimal places
import numpy as np
float_formatter = "{:.2f}".format
np.set_printoptions(formatter={'float_kind':float_formatter})

import micropip
await micropip.install("ipywidgets")
import ipywidgets as widgets
from IPython.display import display
import operator

def check_answer(variable_name, expected, comparison = operator.eq):
    output = widgets.Output()
    button = widgets.Button(description="Check answer")
    def _inner_check(button):
        with output:
            if comparison(globals()[variable_name], expected):
                output.outputs = [{'name': 'stdout', 'text': 'Correct!', 'output_type': 'stream'}]
            else:
                output.outputs = [{'name': 'stdout', 'text': 'Incorrect!', 'output_type': 'stream'}]
    button.on_click(_inner_check)
    display(button, output)
import numpy as np
import matplotlib.pyplot as plt
# times of observation [months]
t = np.arange(8)

# observed heights [m]
y = [1.39, 1.26, 1.48, 4.03, 5.89, 5.14, 6.47, 7.64]

# number of observations
m = len(t)

Fill in the correct \(\mathrm{A}\)-matrix.

# design matrix
A = np.column_stack((np.ones(m), t))
check_answer("A", np.column_stack((np.ones(m), t)), np.array_equiv)

Define the weight matrix for \(\mathrm{W}=I_m\)

W_1 = 1 #added to have initial input
# Weight matrix for case 1
W_1 = ?
check_answer("W_1",np.eye(m), np.array_equiv)

Define the weight matrix with the weight of first 3 observations is five times as large as the rest

W_2 = 1
# Weight matrix for case 2
W_2 = ?
# Weight matrix for case 2
w=5
W_2_ans = np.eye(m)
W_2_ans[0,0] = w
W_2_ans[1,1] = w
W_2_ans[2,2] = w
check_answer("W_2",W_2_ans, np.array_equiv)

Define the weight matrix with the weight of 4th and 5th observation is five times as large as the rest

# Weight matrix for case 3
W_3 = ?