Bayesian Machine Learning

The following posts contain the summaries of lessons in the Bayesian Machine Learning course taught at the Hebrew University. These summaries don’t contain all of the material, but are pretty comprehensive all on their own.

The Bayesian Philosophy

A high-level description of the frequentist and Bayesian approaches, their differences, and some of their shared qualities.

15 min read
Decision Theory and Bayes-Optimal Estimators

An overview of Bayesian decision theory. As part of this post, we will look into estimation as well as proofs for the MLE, MMSE and MAP estimators.

14 min read
The Gaussian Distribution

The Gaussian distribution is hands-down the most-used distribution in machine learning. This post will go through key aspects of the normal distribution and its representations.

14 min read
Estimating the Gaussian Distribution

The math of estimating the parameters of a Gaussian using MLE as well as Bayesian inference, with some intuition regarding the effects of sample size and prior selection.

13 min read
Linear Regression

Overview of the construction of linear regression as well as it's classical and Bayesian solutions.

14 min read
Equivalent Form for Bayesian Linear Regression

The construction of an equivalent form for Bayesian linear regression, which is helpful when there are more features than data points.

5 min read
Evidence Function

The evidence function (or marginal likielihood) is one of the cornerstones of Bayesian machine learning. This post shows the construction of the evidence and how it can be used in the context of Bayesian linear regression.

19 min read
Kernels and Kernel Regression

The kernel trick allows us to move from regression over a predefined set of basis functions to regression in infinite spaces. All of this is predicated on understanding what a kernel even is and how to construct it. In this post, we will see exactly how to do this and how to use kernels for regression.

15 min read
More on Kernel Regression

Having defined kernels, this post delves into how such kernels can be used in the context of linear regression. This results in an extremely powerful model, but also adds computational problems when confronted with vast amounts of data. To over come these problems, we briefly introduce the subset of methods, subset of regressors and random Fourier feature estimates for kernel machines.

20 min read
Gaussian Processes

Going one step further into kernel regression, Gaussian processes allow us to define distributions (i.e. priors) over the predictive functions themselves.

12 min read
Discriminative Classification

In this post, we start talking about classification, finally moving on from the world of linear regression. It turns out that exchanging the continuous outputs for discrete ones nullifies all of the maths we saw in the world of linear regression.

7 min read
Generative Classification

Discriminative classification, while being simple, is also quite hard to treat in the Bayesian framework. Generative classification is slightly more forgiving, and is the focus of this post.

13 min read
Extras - Linear Algebra and Probability

Almost all of the material in linear algebra and probability needed to understand research in Bayesian machine learning.

25 min read