Bayesian Machine Learning
The following posts contain the summaries of lessons in the Bayesian Machine Learning course taught at the Hebrew University. These summaries don’t contain all of the material, but are pretty comprehensive all on their own.
-
The Bayesian Philosophy
A high-level description of the frequentist and Bayesian approaches, their differences, and some of their shared qualities.
-
Decision Theory and Bayes-Optimal Estimators
An overview of Bayesian decision theory. As part of this post, we will look into estimation as well as proofs for the MLE, MMSE and MAP estimators.
-
The Gaussian Distribution
The Gaussian distribution is hands-down the most-used distribution in machine learning. This post will go through key aspects of the normal distribution and its representations.
-
Estimating the Gaussian Distribution
The math of estimating the parameters of a Gaussian using MLE as well as Bayesian inference, with some intuition regarding the effects of sample size and prior selection.
-
Linear Regression
Overview of the construction of linear regression as well as it's classical and Bayesian solutions.
-
Equivalent Form for Bayesian Linear Regression
The construction of an equivalent form for Bayesian linear regression, which is helpful when there are more features than data points.
-
Evidence Function
The evidence function (or marginal likielihood) is one of the cornerstones of Bayesian machine learning. This post shows the construction of the evidence and how it can be used in the context of Bayesian linear regression.
-
Kernels
The kernel trick allows us to move from regression over a predefined set of basis functions to regression in infinite spaces. All of this is predicated on understanding what a kernel even is and how to construct it. In this post, we will see exactly how to do this and how to use kernels for regression.
-
More on Kernel Regression
Having defined kernels, this post delves into how such kernels can be used in the context of linear regression. This results in an extremely powerful model, but also adds computational problems when confronted with vast amounts of data. To over come these problems, we briefly introduce the subset of methods, subset of regressors and random Fourier feature estimates for kernel machines.
-
Gaussian Processes
Going one step further into kernel regression, Gaussian processes allow us to define distributions (i.e. priors) over the predictive functions themselves.
-
Discriminative Classification
In this post, we start talking about classification, finally moving on from the world of linear regression. It turns out that exchanging the continuous outputs for discrete ones nullifies all of the maths we saw in the world of linear regression.
-
Generative Classification
Discriminative classification, while being simple, is also quite hard to treat in the Bayesian framework. Generative classification is slightly more forgiving, and is the focus of this post.
-
Extras - Linear Algebra and Probability
Almost all of the material in linear algebra and probability needed to understand research in Bayesian machine learning.