Bayesian Machine Learning

The following posts contain the summaries of lessons in the Bayesian Machine Learning course taught at the Hebrew University. These summaries don’t contain all of the material, but are pretty comprehensive all on their own.


  • The Bayesian Philosophy

    A high-level description of the frequentist and Bayesian approaches, their differences, and some of their shared qualities.

  • Decision Theory and Bayes-Optimal Estimators

    An overview of Bayesian decision theory. As part of this post, we will look into estimation as well as proofs for the MLE, MMSE and MAP estimators.

  • The Gaussian Distribution

    The Gaussian distribution is hands-down the most-used distribution in machine learning. This post will go through key aspects of the normal distribution and its representations.

  • Estimating the Gaussian Distribution

    The math of estimating the parameters of a Gaussian using MLE as well as Bayesian inference, with some intuition regarding the effects of sample size and prior selection.

  • Linear Regression

    Overview of the construction of linear regression as well as it's classical and Bayesian solutions.

  • Equivalent Form for Bayesian Linear Regression

    The construction of an equivalent form for Bayesian linear regression, which is helpful when there are more features than data points.

  • Evidence Function

    The evidence function (or marginal likielihood) is one of the cornerstones of Bayesian machine learning. This post shows the construction of the evidence and how it can be used in the context of Bayesian linear regression.

  • Kernels

    The kernel trick allows us to move from regression over a predefined set of basis functions to regression in infinite spaces. All of this is predicated on understanding what a kernel even is and how to construct it. In this post, we will see exactly how to do this and how to use kernels for regression.

  • More on Kernel Regression

    Having defined kernels, this post delves into how such kernels can be used in the context of linear regression. This results in an extremely powerful model, but also adds computational problems when confronted with vast amounts of data. To over come these problems, we briefly introduce the subset of methods, subset of regressors and random Fourier feature estimates for kernel machines.

  • Gaussian Processes

    Going one step further into kernel regression, Gaussian processes allow us to define distributions (i.e. priors) over the predictive functions themselves.

  • Discriminative Classification

    In this post, we start talking about classification, finally moving on from the world of linear regression. It turns out that exchanging the continuous outputs for discrete ones nullifies all of the maths we saw in the world of linear regression.

  • Generative Classification

    Discriminative classification, while being simple, is also quite hard to treat in the Bayesian framework. Generative classification is slightly more forgiving, and is the focus of this post.

  • Extras - Linear Algebra and Probability

    Almost all of the material in linear algebra and probability needed to understand research in Bayesian machine learning.