Probability Theory#
Probability theory underpins every foundational machine learning development we will be treating in this course. We start by defining two simple and intuitive rules from which basically every other concept is derived from.
Rules of probability#
Consider we are picking at random a very large number
The grid below shows all possible “bins” our draws can fall into, with five possible values for
Fig. 1 Two random variables, with possible values on a grid and some sampled observations (gray)#
You can each gray circle above to be one of the
Joint probability
The probability of observing
Marginal probability
Conditional probability
Which can all be observed from the shaded regions in the figure above.
From these three observations, we can get to the two crucial rules of probability, namely the sum and the product rules:
Sum rule
From the joint, we can marginalize variables by summing over all other variables.
Product rule
We can recover the joint probability by combining a marginal and a conditional.
Further Reading
Read Section 1.2 up until 1.2.1 (pages 12-17). Try to see how the conditional and marginal histograms of Figure 1.11 arise from the points sampled for the two variables
bishop-prml
Probability densities#
We now move to expressing probabilities over continuous variables. In that case, it does not make sense anymore to have a probability associated with any given value of the variable, as there are infinitely many possibilities.
Instead, we can only compute the probability that a value falls somewhere within a range. To do this, we introduce a probability density
Furthermore, probability densities are always subject to the two constraints:
which makes intuitive sense: probabilities cannot be negative, and once every possibility is accounted for they should sum up to 1.
The same rules of probability are also valid for probability densities, that is:
Rules of probability for densities
Expectations, Monte Carlo approximation#
Consider a function
Expectation of a function
Sometimes we do not have an exact expression for
Monte Carlo approximation
Further Reading
The Monte Carlo approximation is propagating the uncertainty over
MUDE
Variances and covariances#
Again for a function
Variance of a function
Through a similar argument we can compute the covariance between two random variables
Covariance between two variables
which expresses how strong the correlation between
If the covariance between