GPs for Regression#
In order to use Gaussian Processes for regression, the first step is of course to observe some targets
which means that we observe
Furthermore, we would also like to predict a new target value
Fig. 30 A GP graph but now with noisy target observations and prediction for
and following the joint distribution encoded by the graph (check this by hand!), we can arrive at a marginal for all our
where we once again have simply followed the standard result for a Gaussian marginal under Bayesian inversion, in this case with
From Eq. (76) we have seen previously, we can now easily include
where you should pay special attention to what comes from
with posterior mean:
and posterior variance:
and we can just as easily predict for multiple values of
Look again at the expressions above: note how a GP for regression is a non-parametric model. Instead of storing information obtained during training in a weight vector, we always need the values of
Below you can see an example of GP model for regression for a one-dimensional input. We opt for a Squared Exponential kernel. On the left you can see the prior over

Fig. 31 Example of GP for regression, prior (left) and posterior (right) over
Note how all desirable features for a robust model are present:
The model avoids overfitting even for very small datasets;
Mean and sampled functions fit closely to the observed data;
The variance gives a measure of prediction uncertainty, increasing away from the observations;
Hyperparameters are learned directly from data without the need for a validation dataset.
In the next page we will go through this last point, namely how to determine suitable values for the kernel hyperparameters.